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Abstract 

Audits  to  detect  policy  violations  coupled  with  punishments  are  essential  to  manage  risks  stemming  from  inap¬ 
propriate  information  use  by  authorized  insiders  in  organizations  that  handle  large  volumes  of  personal  information 
(e.g.,  in  healthcare,  finance,  Web  services  sectors).  Our  main  result  is  an  audit  mechanism  that  effectively  manages 
organizational  risks  by  balancing  the  cost  of  audit  and  punishment  against  the  expected  loss  from  policy  violations. 
We  model  the  interaction  between  an  organization  (defender)  and  an  employee  (adversary)  as  a  suitable  repeated 
game.  We  assume  that  the  defender  is  fully  rational  and  the  adversary  is  near-rational  (i.e.,  acts  rationally  with  high 
probability  and  in  a  byzantine  manner  otherwise).  The  mechanism  prescribes  a  strategy  for  the  defender  that  when 
paired  with  the  adversary’s  best  response  to  it  yields  an  asymmetric  subgame  perfect  equilibrium.  This  equilibrium 
concept,  which  we  define,  implies  that  the  defender’s  strategy  is  approximately  optimal  (she  might  only  gain  a  small 
bounded  amount  of  utility  by  deviating)  while  the  adversary  does  not  gain  at  all  from  deviating  from  her  best  re¬ 
sponse  strategy.  We  provide  evidence  that  a  number  of  parameters  in  the  game  model  can  be  estimated  from  prior 
empirical  studies,  suggest  specific  studies  that  can  help  estimate  other  parameters,  and  design  a  learning  algorithm 
that  the  defender  can  use  to  provably  learn  the  adversary’s  private  incentives.  Finally,  we  use  our  model  to  predict 
observed  practices  in  industry  (e.g.,  differences  in  punishment  rates  of  doctors  and  nurses  for  the  same  violation)  and 
the  effectiveness  of  policy  interventions  (e.g..  data  breach  notification  laws  and  government  audits)  in  encouraging 
organizations  to  conduct  more  thorough  audits. 


1  Introduction 

The  importance  of  audit  and  accountability  mechanisms  to  detect  policy  violations  and  punish  violators  has  been 
recognized  in  computer  security  [35]  as  well  as  in  recent  public  policy  discussions  on  privacy  protection  [14,53]. 
Specifically,  experts  from  privacy  enforcement  agencies,  industry,  civil  society  and  academia  have  recently  developed 
a  series  of  white  papers  on  accountability-based  privacy  governance  in  which  one  recommendation  is  that  organisations 
should  have  in  place  policies  and  procedures  for  enforcement  of  internal  data  protection  rules  and  personnel  who 
disregard  those  rules  or  misappropriate  or  misuse  data  should  be  subject  to  sanctions,  including  dismissal  [14].  Indeed 
such  violations  and  sanctions  are  routinely  reported  in  the  healthcare  sector  [  1 , 28, 33, 42]  and  we  are  beginning  to  see 
the  emergence  of  commercial  audit  tools  to  assist  in  the  process  of  detecting  violations  [19]. 

The  central  scientific  question  that  this  state  of  affairs  raises  is  how  to  design  effective  audit  and  punishment 
schemes.  This  paper  articulates  a  desirable  property  and  presents  an  audit  mechanism  that  provably  achieves  that 
property  against  a  class  of  adversaries.  The  high-level  observation  here  is  that  audits  coupled  with  punishments 

*This  work  was  partially  supported  by  the  U.S.  Army  Research  Office  contract  “Perpetually  Available  and  Secure  Information  Systems” 
(DAAD 19-02- 1-03 89)  to  Carnegie  Mellon  CyLab,  the  NSF  Science  and  Technology  Center  TRUST,  the  NSF  CyberTrust  grant  “Privacy,  Com¬ 
pliance  and  Information  Risk  in  Complex  Organizational  Processes,”  the  AFOSR  MURI  “Collaborative  Policies  and  Assured  Information  Sharing,” 
and  HHS  Grant  no.  HHS  90TR0003/01.  Jeremiah  Blocki  was  also  partially  supported  by  a  NSF  Graduate  Fellowship.  Arunesh  Sinha  was  also 
partially  supported  by  the  CMU  CIT  Bertucci  Fellowship.  The  views  and  conclusions  contained  in  this  document  are  those  of  the  authors  and 
should  not  be  interpreted  as  representing  the  official  policies,  either  expressed  or  implied,  of  any  sponsoring  institution,  the  U.S.  government  or  any 
other  entity. 
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constitute  a  mechanism  for  managing  risks — a  suitable  audit  mechanism  effectively  manages  organizational  risks  by 
balancing  the  cost  of  audit  and  punishment  against  the  expected  loss  from  policy  violations. 

At  a  technical  level,  we  model  the  interaction  between  the  organization  (the  defender)  and  its  employee  (the 
adversary)  as  a  repeated  extensive  form  game  with  imperfect  information  (the  adversary’s  actions  are  not  observable 
to  the  defender)  and  public  signals  (the  outcome  of  the  audit,  i.e.  how  many  violations  are  detected  and  the  rate  of 
inspection  and  punishment  are  revealed  publicly).  The  game  model  (described  in  Section  3)  augments  the  model  in 
previous  work  [10]  by  incorporating  the  incentives  of  rational  adversaries.  Adversaries  benefit  from  violations  they 
commit  (e.g.,  by  selling  personal  data)  and  suffer  due  to  punishments  imposed  for  detected  violations.  We  refer  to 
the  benefit  derived  by  the  adversary  from  committing  violations  as  her  personal  benefit  and  assume  that  it  is  initially 
hidden  from  the  defender.  In  order  to  account  for  adversaries  who  are  generally  rational  but  may  sometimes  act 
irrationally,  we  consider  near-rational  adversaries  who  choose  actions  that  maximize  their  (expected)  utility  with 
probability  (1  —  e)  and  with  probability  e  act  arbitrarily.  This  model  is  inspired  by  the  trembling  hand  assumption 
from  game  theory  [21],  The  model  also  includes  a  loss  for  the  organization  if  the  punishment  rate  is  set  too  high 
(to  capture  loss  in  productivity  resulting,  e.g.,  from  employee  dismissal,  rehiring  and  training  [2, 19]),  in  addition  to 
the  cost  of  inspecting  and  the  loss  due  to  policy  violations.  The  model  generalizes  from  the  situation  in  which  the 
defender  interacts  with  a  single  adversary  to  one  where  she  interacts  with  multiple,  non-colluding  adversaries  via  a 
natural  product  game  construction  that  we  define. 

Our  main  contribution  is  an  audit  and  accountability  mechanism  that  proceeds  in  two  phases  for  each  audit  cycle — 
a  detection  and  estimation  phase  and  an  audit  phase.  For  the  audit  phase,  we  design  a  strategy  for  the  defender  such 
that  the  adversary’s  best  response  to  it  yields  an  asymmetric  subgame  perfect  equilibrium.  This  equilibrium  concept, 
which  we  define,  implies  that  the  defender’s  strategy  is  approximately  optimal  (she  might  only  gain  a  small  bounded 
amount  of  utility  by  deviating)  while  the  adversary  does  not  gain  at  all  from  deviating  from  her  best  response  strategy 
(see  Section  4).  We  define  this  equilibrium  concept  by  adapting  the  standard  notion  of  approximate  subgame  perfect 
equilibrium,  which  has  a  symmetric  flavor  and  permits  both  players  to  obtain  small  gains  by  unilaterally  deviating 
from  their  equilibrium  strategy.  We  believe  that  the  symmetric  equilibrium  concept  is  unsuitable  for  our  security 
application  where  an  adversary  who  deviates  motivated  by  a  small  gain  could  cause  a  big  loss  for  the  organization. 
This  asymmetry  is  also  indicative  of  the  nature  of  the  audit  game — it  is  not  a  game  between  peers,  but  one  in  which 
the  defender  has  greater  power  since  she  gets  to  decide  the  rate  of  inspection  and  the  punishment  level. 

For  the  detection  and  estimation  phase,  we  rely  on  standard  techniques  used  in  risk  assessment  [2,23,30,40,44,49] 
to  estimate  the  parameters  of  the  game  model.  We  provide  evidence  that  a  number  of  parameters  in  the  game  model 
can  be  estimated  from  prior  studies  [17,46,48,57]  and  suggest  specific  studies  that  can  help  estimate  other  parameters. 
In  addition,  we  design  a  learning  algorithm  (described  in  Section  5)  that  the  defender  can  use  to  provably  learn  the 
adversary’s  personal  benefit  parameter.  The  learning  algorithm  works  in  an  adversarial  labeling  of  training  data 
points  setting  [3,  32].  In  general,  it  is  impossible  to  guarantee  any  learning  if  the  adversary  can  label  training  data 
points  arbitrarily,  thus,  some  constraints  have  to  be  imposed  on  the  adversary  for  such  learning  to  work  [3,  32],  We 
use  a  novel  game-theoretic  argument  to  impose  such  constraints,  as  the  impatient  near  rational  adversary  acts  in  a 
manner  that  maximizes  her  immediate  utility.  The  defender  interacts  with  the  adversary  iteratively  adjusting  the  rate 
of  inspection  and  the  level  of  punishment  to  provably  learn  the  adversary’s  personal  benefit  parameter.  This  technique 
is  quite  general  and  can  be  used  in  other  adversarial  machine  learning  settings. 

Finally,  we  use  our  model  to  predict  observed  practices  in  industry  (e.g.,  differences  in  punishment  rates  of  doctors 
and  nurses  for  the  same  violation)  and  the  effectiveness  of  policy  interventions  (e.g.,  data  breach  notification  laws 
and  government  audits)  in  encouraging  organizations  to  conduct  more  thorough  audits  (see  Section  6).  We  present 
comparisons  to  additional  related  work  in  Section  7  and  conclusions  and  directions  for  future  work  in  Section  8. 

2  Overview 

In  this  section,  we  provide  an  overview  of  our  model  using  a  motivating  scenario  that  will  serve  as  a  running  example 
for  this  paper.  Consider  a  “Hospital  X”  with  employees  in  different  roles  (doctors,  nurses,  accountants).  X  has  an 
internal  policy  that  mandates  weekly  HIPAA  compliance  audits,  notably  to  ensure  that  accesses  to  personal  health 
records  are  legitimate.  Given  budget  constraints,  X  cannot  check  every  single  access.  The  first  step  in  the  audit 
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process  is  to  analyze  the  access  logs  using  an  automated  tool  that  ranks  accesses  by  their  probability  of  being  a 
violation.  Hospital  X  assesses  the  (monetary)  impact  of  different  types  of  violations  and  decides  what  subset  to  focus 
on  by  balancing  the  cost  of  audit  and  the  expected  impact  (“risk”)  from  policy  violations.  This  type  of  audit  mechanism 
is  common  in  practice  [30,40,44,49]. 

We  provide  a  game  model  for  this  audit  process  that  incorporates  behavioral  factors  in  risk  assessment.  We  assume 
that  employees  are  rational:  while  they  are  not  trying  to  disrupt  their  organization’s  business,  they  may  violate  certain 
policies  if  they  benefit  from  doing  so.  The  organization  (e.g..  Hospital  X)  is  a  rational  entity  that  is  trying  to  maximize 
its  expected  utility,  i.e.,  balance  audit  costs  against  risks  from  employee  non-compliance. 

More  precisely,  an  employee  (“adversary,”  A)  executes  tasks,  i.e.,  actions  that  are  permitted  as  part  of  their  job. 
We  only  consider  tasks  that  can  later  be  audited,  e.g.,  through  inspection  of  logs.  For  example,  in  X  the  tasks  are 
accesses  to  health  records.  We  can  distinguish  A’s  tasks  between  legitimate  tasks  and  violations  of  a  policy.  Different 
types  of  violations  may  have  different  impact  on  the  organization.  We  assume  that  there  are  K  different  types  of 
violations  that  A  can  commit.  The  economic  impact  of  a  violation  on  the  organization  depends  on  its  type.  Examples 
of  violations  of  different  types  in  Hospital  X  include  inappropriate  access  to  a  celebrity’s  health  record,  or  access 
to  a  health  record  leading  to  identity  theft.  A  benefits  by  committing  violations:  the  benefit  is  quantifiable  using 
information  from  existing  studies,  human  judgment,  or  by  the  algorithmic  technique  we  propose  in  Section  5.  For 
example,  reports  [2, 17]  indicate  that  on  average  the  personal  benefit  of  a  hospital  employee  from  selling  a  common 
person’s  health  record  is  $50.  On  the  other  hand,  if  A  is  caught  committing  a  violation  then  she  is  punished  according 
to  the  punishment  policy  used  by  V.  In  the  case  of  Hospital  X,  employees  could  be  terminated,  as  happened  in  similar 
recent  incidents  [33,42], 

The  organization  V  can  classify  each  adversary’s  task  by  type.  However,  V  cannot  determine  with  certainty 
whether  a  particular  task  is  legitimate  or  a  violation  without  investigating.  Furthermore,  V  cannot  inspect  all  of  *4’s 
tasks  due  to  budgetary  constraints.  As  such,  some  violations  may  go  undetected  internally ,  but  could  be  detected 
externally.  Governmental  audits,  whistle-blowing,  patient  complains  [45,57]  are  all  examples  of  situations  that  could 
lead  to  external  detection  of  violations.  Externally  detected  violations  usually  cause  more  economic  damage  than 
internally  caught  violations.  For  instance,  as  indicated  in  the  2011  Ponemon  Institute  report  [48],  a  patient  whose 
privacy  has  been  violated  is  probably  more  likely  to  leave  (and  possibly  sue)  a  hospital  if  they  discover  the  violation 
on  their  own  than  if  the  hospital  detects  the  violation  and  proactively  notifies  the  patient. 

The  economic  impact  of  a  violation  is  a  combination  of  direct  and  indirect  costs:  direct  costs  include  breach 
notification  and  remedial  cost,  and  indirect  costs  include  loss  of  customers  and  brand  value.  For  example,  the  2010 
Ponemon  Institute  report  [46]  states  that  the  average  cost  of  privacy  breach  per  record  in  health  care  is  $301  with 
indirect  costs  corresponding  to  about  two  thirds  of  that  amount.  Of  course,  certain  violations  may  result  in  much 
higher  direct  costs,  e.g.,  $25,  000  per  record  (up  to  $250, 000  in  total)  in  fines  alone  in  the  state  of  California  [42], 
While  these  amounts  may  incentivize  organizations  to  use  aggressive  audits,  they  have  to  be  balanced  with  the  fact 
that  severe  punishment  policies  result  in  a  hostile  work  environment,  low  employee  motivation  and  failure  to  attract 
new  talent  —  causing  economic  losses  for  the  organization  [12], 

In  other  words,  the  organization  needs  to  balance  auditing  costs,  potential  economic  damages  due  to  violations  and 
the  economic  impact  of  the  punishment  policy.  The  employees  need  to  weigh  their  gain  from  violating  policies  against 
loss  from  getting  caught  by  an  audit  and  punished.  The  actions  of  one  party  impact  the  actions  of  the  other  party:  if 
employees  never  violate,  the  organization  does  not  need  to  audit;  likewise,  if  the  organization  never  audits,  employees 
can  violate  policies  in  total  impunity.  Given  this  strategic  interdependency,  we  model  the  auditing  process  as  a  repeated 
game  between  the  organization  and  its  employees,  where  the  game  repeats  over  discrete  rounds  characterizing  audit 
cycles.  The  game  is  parameterized  by  quantifiable  variables  such  as  the  personal  benefit  of  employee,  the  cost  of 
breach,  and  the  cost  of  auditing,  among  others. 

3  Model 

We  begin  by  providing  a  high  level  view  of  the  audit  process  (Section  3.1),  before  describing  the  audit  game  in  detail 
(Section  3.2).  Finnaly,  we  describe  estimation  and  detection  of  parameters  of  the  audit  game  (Section  3.3). 
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3.1  The  Audit  Process 

In  practice,  the  organization  is  not  playing  a  repeated  audit  game  against  a  specific  employee,  but  against  all  of 
its  n  employees  at  the  same  time.  However,  if  we  assume  that  1)  a  given  employee’s  actions  for  a  type  of  task 
are  independent  of  her  actions  for  other  types,  and  that  2)  employees  do  not  collude  with  other  employees  and  act 
independently,  we  can  decompose  the  overall  game  into  n K  independent  base  repeated  games,  that  the  organization 
plays  in  parallel.  One  base  repeated  game  corresponds  to  a  given  type  of  access  k  by  a  given  employee  A,  and  will 
be  denoted  by  Gam-  Each  game  Gam  is  described  using  many  parameters  like  loss  due  to  violations,  personal  benefit 
for  employee,  etc.  We  abuse  notation  in  using  Gam  to  refer  to  a  base  repeated  game  of  type  k  with  any  value  of  the 
parameters. 

In  our  proposed  audit  process  the  organization  follows  the  steps  listed  below  in  each  audit  cycle  for  every  game 
Gam ■  Assume  the  parameters  of  the  game  have  been  estimated  and  the  equilibrium  audit  strategy  computed  for  the 
first  time  auditing  is  performed. 

before  audit : 

1.  If  any  parameter  changes  go  to  step  2  else  go  to  audit. 

2.  Estimate  parameters.  Compute  equilibrium  of  Gam- 
audit : 

3.  Audit  using  actions  of  the  computed  equilibrium. 

Note  that  the  parameters  of  Gam  maY  change  for  any  given  round  of  the  game,  resulting  in  a  different  game.  However, 
neither  V  nor  A  knows  when  that  will  happen.  As  such,  since  the  horizon  of  Gam.  with  a  fixed  set  of  parameters  is  in¬ 
finite,  we  can  describe  the  interaction  between  the  organization  and  its  employees  with  an  infinitely  repeated  game  for 
the  period  in  which  the  parameters  are  unchanged  (see  [21]  for  details).  Thus,  the  game  Gam  is  an  infinitely  repeated 
game  of  imperfect  information  since  _4’s  action  is  not  directly  observed.  Instead,  noisy  information  about  the  action, 
called  a  public  signal  is  observed.  The  public  signal  here  consists  of  a)  the  detected  violations  b)  number  of  tasks  by 
A  and  c)  D's  action.  The  K  parallel  games  played  between  A  and  V  can  be  composed  in  a  natural  manner  into  one 
repeated  game  (which  we  call  Ga)  by  taking  the  product  of  action  spaces  and  adding  up  utilities  from  the  games. 

3.2  Formal  Description 

In  the  remainder  of  this  section,  we  focus  on  the  base  repeated  games  Gam-  We  use  the  following  notations  in  this 
paper: 

•  Vectors  are  represented  with  an  arrow  on  top,  e.g.,  (7  is  a  vector.  The  ith  component  of  a  vector  is  given  by  v  (i).  v  <  a 
means  that  both  vectors  have  the  same  number  of  components  and  for  any  component  i,  v  (i)  <  a  (i). 

•  Random  variables  are  represented  in  boldface,  e.g.,  x  and  X  are  random  variables. 

•  E (X)  [7/.  r]  denotes  the  expected  value  of  random  variable  X,  when  particular  parameters  of  the  probability  mass 
function  of  X  are  set  to  q  and  r. 

•  We  will  use  a  shorthand  form  by  dropping  A,  k  and  the  vector  notation,  as  we  assume  these  are  implicitly  understood 
for  the  game  Gam-  That  is,  a  quantity  XA(k)  will  be  simply  denoted  as  x.  We  use  this  form  whenever  the  context  is 
restricted  to  game  Gam  only. 

Gam  is  fully  defined  by  the  players,  the  time  granularity  at  which  the  game  is  played,  the  actions  the  players  can 
take,  and  the  utility  the  players  obtain  as  a  result  of  the  actions  they  take.  We  next  discuss  these  different  concepts  in 
turn. 

Players:  The  game  Gam  is  played  between  the  organization  V  and  an  adversary  A.  For  instance,  the  players  are 
hospital  X  and  a  nurse  in  X. 

Round  of  play:  In  practice,  audits  for  all  employees  and  all  types  of  access  are  performed  together  and  usually 
periodically.  Thus,  we  adopt  a  discrete-time  model,  where  time  points  are  associated  with  rounds.  Each  round  of  play 
corresponds  to  an  audit  cycle.  We  group  together  all  of  the  adversary’s  actions  (tasks  of  a  given  type)  in  a  given  round. 
All  games  Gam  are  synchronized,  that  is  all  given  rounds  t  in  all  games  are  played  simultaneously. 

Adversary  action  space:  In  each  round,  the  adversary  A  chooses  two  quantities  of  type  k:  the  number  of  tasks  she 
performs,  and  the  number  of  such  tasks  that  are  violations.  If  we  denote  by  L4  the  maximum  number  of  type  k  tasks 
that  any  employee  can  perform,  then  A’s  entire  action  space  for  Gam  is  giyen  by  Ak  x  14  with  A k  =  {uk,  ■  ■  ■ ,  £4} 
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(uk  <  Uk)  and  Ak  =  {1, . . . ,  Uk}.  Let  and  v ^  be  vectors  of  length  K  such  that  the  components  of  vector  a  are  the 
number  of  tasks  of  each  type  that  A  performs  at  time  t,  and  the  components  of  vector  v  are  the  number  of  violations 
of  each  type.  Since  violations  are  a  subset  of  all  tasks,  we  always  have  <  *A-  In  a  given  audit  cycle,  A’s  action  in 
the  game  Gam  is  defined  by  (o^(fc),  v^(k)),  that  is  (a4,  vl)  in  shorthand  form,  with  a*  £  Ak  and  v*  £  Ak- 

Instead  of  being  perfectly  rational,  we  model  A  as  playing  with  a  trembling  hand  [21],  Whenever  A  chooses  to 
commit  v 4  violations  in  as  given  round  t,  she  does  so  with  probability  1  —  eth,  but,  with  (small)  probability  eth  she 
commits  some  other  number  of  violations  sampled  from  an  unknown  distribution  Dq  over  all  possible  violations.  In 
other  words,  we  allow  A  to  act  completely  arbitrarily  when  she  makes  a  mistake.  For  instance,  a  nurse  in  X  may  lose 
her  laptop  containing  health  records  leading  to  a  breach. 

Defender  action  space:  V  also  chooses  two  quantities  of  type  k  in  each  round:  the  number  of  inspections  to  perform, 
and  the  punishment  to  levy  for  each  type-fc  violation  detected.  Let  be  the  vector  of  length  K  such  that  components 
of  vector  are  the  number  of  inspections  of  each  type  that  V  performs  in  round  t.  The  number  of  inspections  that  V 
can  conduct  is  bounded  by  the  number  of  tasks  that  A  performs,  and  thus,  <  dfA.  V  uses  a  log  analysis  tool  M  to 
sort  accesses  according  to  the  probability  of  them  being  a  violation.  Then,  V  chooses  the  top  =  s4  tasks  from 

the  sorted  output  of  Jvt  to  inspect  in  game  Gam-  Inspection  is  assumed  perfect,  i.e.,  if  a  violation  is  inspected,  it  is 
detected.  The  number  of  inspections  is  bounded  by  budgetary  constraints.  Denoting  the  function  that  outputs  cost  of 
inspection  for  each  type  of  violation  by  C,  we  have  C'(fc)(st)  <  64(/c)  where  64(fc)  defines  a  per-employee,  per-type 
budget  constraint.  The  budget  allocation  problem  is  an  optimization  problem  depending  on  the  audit  strategy.  We 
present  this  problem  assuming  our  proposed  audit  strategy  in  Appendix  B.3. 

V  also  chooses  a  punishment  rate  P\(k)  =  P4  (fine  per  violation  of  type  in  each  round  t  to  punish  A  if 
violations  of  type  k  are  detected.  The  punishment  rate  P4  is  bounded  by  a  maximum  punishment  Pf  corresponding  to 
the  employee  being  fired,  and  the  game  terminated. 

Finally,  P’s  choice  of  the  inspection  action  can  depend  only  on  ,4’s  total  number  of  tasks,  since  the  number  of 
violations  is  not  observed.  Thus,  V  can  choose  its  strategy  as  a  function  from  number  of  tasks  to  inspections  and 
punishment  even  before  A  performs  its  action.  In  fact,  we  simulate  T>  acting  first  and  the  actions  are  observable 
by  requiring  V  to  commit  to  a  strategy  and  provide  a  proof  of  honoring  the  commitment.  Specifically,  V  computes 
its  strategy,  makes  it  public  and  provides  a  proof  of  following  the  strategy  after  auditing  is  done.  The  proof  can  be 
provided  by  maintaining  an  audit  trail  of  the  audit  process  itself. 

Outcomes:  We  define  the  outcome  of  a  single  round  of  Gam  as  the  number  of  violations  detected  in  internal  audit 
and  the  number  of  violations  detected  externally.  We  assume  that  there  is  a  fixed  exogenous  probability  p  (0  <  p  <  1) 
of  an  internally  undetected  violation  getting  caught  externally.  Due  to  the  probabilistic  nature  of  all  quantities,  the 
outcome  is  a  random  variable.  Let  be  the  vector  of  length  I\  such  that  the  O \(k)  =  O4  represents  the  outcome 
for  the  tth  round  for  the  game  Gam-  Then  O4  is  a  tuple  (04„t,  0\xt)  of  violations  caught  internally  and  externally. 
As  stated  earlier,  we  assume  the  use  of  a  log  analysis  tool  A4  to  rank  the  accesses  with  more  likely  violations  being 
ranked  higher.  Then,  the  probability  mass  function  for  0\nt  is  a  distribution  parameterized  by  (a4,  t;4),  s  and  M.  The 
worst  performance  of  A4  is  when  the  s  accesses  to  be  inspected  are  chosen  at  random,  resulting  in  a  hyper-geometric 
distribution  with  mean  u4a4,  where  a4  =  s4/a4.  We  assume  that  the  mean  of  the  distribution  is  p(at)vtat,  where 
p(at)  is  a  function  dependent  on  a4  that  measures  the  performance  of  A4  and  Vcr4  £  [0,1].//  >  //(a4)  >  1  for 
some  constant  p  (p  is  overloaded  here).  Note  that  we  must  have  p{at)at  <  1,  and  further,  we  assume  that  p{at) 
is  monotonically  non-increasing  in  a4.  The  probability  mass  function  for  04xt  conditioned  on  0\nt  is  a  binomial 
distribution  parameterized  by  p. 

Utility  functions:  In  a  public  signaling  game  like  GaM’  the  utilities  of  the  players  depend  only  on  the  public  signal 
and  their  own  action,  while  the  strategies  they  choose  depend  on  the  history  of  public  signals  [37].  The  utility  of 
the  repeated  game  is  defined  as  a  (delta-discounted)  sum  of  the  expected  utilities  received  in  each  round,  where  the 
expectation  is  taken  with  respect  to  the  distribution  over  histories.  Let  the  discount  factor  for  V  be  6j>  and  for  any 
employee  A  be  6a •  We  assume  that  V  is  patient,  i.e.,  future  rewards  are  almost  as  important  as  immediate  rewards, 
and  6xi  is  close  to  1.  A  is  less  patient  than  V  and  hence  6a  <  6x- 

Defender  utility  function:  2?’s  utility  in  a  round  of  the  game  Gam  consists  of  the  sum  of  the  cost  of  inspecting  ,4’s 
actions,  the  monetary  loss  from  a  high  punishment  rate  for  A,  and  direct  and  indirect  costs  of  violations.  In  essence, 
V  has  to  find  the  right  balance  between  inspecting  with  higher  coverage  (which  incurs  high  costs),  letting  violations 
occur  (which  results  in  direct/indirect  costs)  and  stifling  employee  productivity  by  setting  a  high  punishment  rate. 


5 


As  discussed  before,  inspection  costs  are  given  by  C(s4)  where  C  =  C(k)  is  a  function  denoting  the  cost  of 
inspecting  type-fc  tasks.  Similarly,  the  monetary  loss  from  losing  employee’s  productivity  due  to  fear  of  punishment  is 
given  by  e(P4),  where  e  =  eU(fc)  is  a  function  for  type-fc  tasks.  The  functions  in  6  and  e  must  satisfy  the  following 
constraints:  1)  they  should  be  monotonically  increasing  in  the  argument  and  2)  C(k)  >  0,  e^(fc)  >  0  for  all  fc. 

We  characterize  the  effect  of  violations  on  the  organization’s  indirect  cost  similarly  to  the  reputation  loss  as  in 
previous  work  [10].  Additionally,  the  generic  function  described  below  is  capable  of  capturing  direct  costs,  as  shown 
in  the  example  following  the  function  specification.  Specifically,  we  define  a  function  rk  ( r  in  shorthand  form)  that, 
at  time  t,  takes  as  input  the  number  of  type-fc  violations  caught  internally,  the  number  of  type-fc  violations  caught 
externally,  and  a  time  horizon  r,  and  outputs  the  overall  loss  at  time  t  +  r  due  to  these  violations  at  time  t.  r  is 
stationary  (i.e.,  independent  of  /  ),  and  externally  caught  violations  have  a  stronger  impact  on  r  than  internally  detected 
violations.  Further,  r((0,  0),  r)  =  0  for  any  r  (undetected  violations  have  0  cost),  and  r  is  monotonically  decreasing 
in  r  and  becomes  equal  to  zero  for  r  >  m  (violations  are  forgotten  after  a  finite  amount  of  rounds).  As  in  previous 
work  [10],  we  construct  the  utility  function  at  round  t  by  immediately  accounting  for  future  losses  due  to  violations 
occurring  at  time  t.  This  allows  us  to  use  standard  game-theory  results,  while  at  the  same  time,  providing  a  close 
approximation  of  the  defender’s  loss  [10].  With  these  notations,  P’s  utility  at  time  t  in  Gaj,  is 

m—  1 

Rew^((st,Pt),  O4)  =  -  Y,  4 r(Ol,j )  -  C(S4)  -  e(P4)  .  (1) 

3=  0 

This  per-round  utility  is  always  negative  (or  at  most  zero).  As  is  typical  of  security  games  (e.g.,  [24, 55]  and  related 
work),  implementing  security  measures  does  not  provide  direct  benefits  to  the  defender,  but  is  necessary  to  pare 
possible  losses.  Hence,  the  goal  for  the  defender  is  to  have  this  utility  as  close  to  zero  as  possible. 

The  above  function  can  capture  direct  costs  of  violations  as  an  additive  term  at  time  r  =  0.  As  a  simple  example 
[10],  assuming  the  average  direct  costs  for  internally  and  externally  caught  violations  are  given  by  Rknt  and  R®xt,  and 
the  function  r  is  linear  in  the  random  variables  0\nt  and  0\xt,  r  can  be  given  by 

f  (C  +  RFnt)°int  +  +  Rext)°lxt  for  T  =  0 

r(Ot,r)=<  5Tc(0\nt  +  ip  •  04xt)  for  1  <  r  <  m 

[  0  for  t  >  m, 

where  8  £  (0, 1)  and  i/j  >  1.  Then  Eqn.  (1)  reduces  to 

Rew^((/,Pt),  0‘)  =  -RintOLt  -  RextOLt  -  C'ts4)  -  e(Pt)  ,  (2) 

with  Rint  =  R\nt  +  Rgt,  R\nt  =  c(l  -  5m5%)/{  1  -  88v)  and  Rext  =  ^R\nt  +  Rgt. 

Adversary  utility  function:  We  define  *4’s  utility  as  the  sum  of  ^4’s  personal  benefit  gained  by  committing  violations 
and  the  punishment  that  results  due  to  detected  violations.  Personal  benefit  is  a  monetary  measure  of  the  benefit  that 
an  employee  gets  out  of  violations.  It  includes  all  kinds  of  benefits,  e.g.,  curiosity,  actual  monetary  benefit  (by  selling 
private  data),  revenge,  etc.  It  is  natural  that  the  personal  benefit  of  an  employee  is  only  known  to  that  employee.  Our 
model  of  personal  benefit  of  an  employee  A  is  linear  and  is  defined  by  a  rate  of  personal  benefit  for  each  type  of 
violation  given  by  the  vector  I  a  of  length  I\ .  Further,  we  assume  that  the  upper  bounds  on  the  private  benefit  are 
publicly  known  and  given  by  /max,^-  The  punishment  is  the  vector  P\  of  length  K  chosen  by  V,  as  discussed  above. 
Using  shorthand  notation,  AA  utility,  for  the  game  Gaj,  ^  is: 

Rew^aVM^P^O4)  =  Iv*  -  P*  {0\nt  +  0\xt)  . 

Observe  that  the  utility  function  of  a  player  depends  on  the  public  signal  (observed  violations,  P’s  action)  and 
the  action  of  the  player,  which  conforms  to  the  definition  of  a  repeated  game  with  imperfect  information  and  public 
signaling.  In  such  games,  the  expected  utility  is  used  in  computing  equilibria. 

Let  a4  =  s4/a4  and  u(at)  =  p(at)at.  Then,  E( 0\nt)  =  and  E( 0|xt)  =  pu4(  1  —  ^(a4))-  The  expected 

utilities  in  each  round  then  become: 

P(Rewp)  =  -Y^Jo1  SvE(r(0tJ))[vt’at’at] 

—C{atat)  -  e(P4)  , 

P(Rew^)  =  Iv 4  —  PV  (V(a4)  +p{  1  —  ^(a4)))  . 
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Figure  1:  Feasible  audit  space:  non-deterred  (x)  and  deterred  (+)  region  for  I  =  $6.  For  I  =  $11  deterred  region  is 
empty. 

The  expected  utility  of  A  depends  only  on  the  level  of  inspection  and  not  on  the  actual  number  of  inspections.  For  the 
example  loss  function  given  by  Eqn.  (2),  the  utility  function  of  V  becomes: 

£(Rewp)  =  -vt(Rintv{at)  +  Rextp(  1  -  ^(ct4)))  -  C(atat)  -  e(P4)  . 

In  addition  to  the  action  dependent  utilities  above,  the  players  also  receive  a  fixed  utility  every  round,  which  is  the 
salary  for  A  and  value  generated  by  A  for  V.  Pf  depends  on  these  values,  and  is  calculated  in  Appendix  B.2.  Finally, 
the  model  parameters  that  may  change  over  time  are  Rext.  R7nt,  P,  function  C,  function  e,  function  p  and  /. 
Graphical  representation:  A  graphical  representation  of  the  utilities  helps  illustrate  the  ideas  presented  in  the  next  two 
sections.  (See  Figure  1).  Consider  the  2-dimensional  plane  Ra  P  spanned  by  a*  and  P*.  We  define  a  feasible  audit 
space  in  Ra,p  given  by  0  <  a:*  <  1  and  0  <  P*  <  Pf.  P’s  actions  are  points  in  the  feasible  region.  The  expected 
utility  of  the  adversary  in  each  round  is  given  by  —  Pt{v{at)  +  p(l  —  i/(a4)))).  Thus,  the  curve  in  Ra'P  given  by 
I  =  Pt(i/(at)  +p(l  — 1/(0:*)))  is  the  separator  between  positive  and  negative  expected  utility  regions  for  the  adversary 
in  each  round.  We  call  the  region  of  positive  expected  utility  inside  the  feasible  region  the  non-deterred  region  and  the 
region  of  negative  utility  inside  the  feasible  region  the  deterred  region. 

3.3  Estimation  and  Detection 

In  this  sub-section,  we  demonstrate  that  the  parameters  of  the  audit  game  can  be  estimated.  We  describe  techniques 
of  estimating  and  detecting  changes  in  parameters  of  game  Qa ,k,  obtaining  sample  estimates  in  the  process.  Before 
getting  to  constant  values,  we  state  the  functions  that  we  use  as  concrete  instances  for  the  examples  in  this  paper.  We 
use  simple  linear  functions  for  audit  cost  ( C(aa )  =  Caa )  and  for  punishment  loss  (e(P)  =  eP).  The  performance  of 
A4  is  dependent  on  the  tool  being  used  and  we  use  a  linear  function  for  p(.)  to  get  v(a)  =  pa  —  (p  —  l)a2,  where  p 
is  a  constant.  Further,  we  use  the  example  loss  function  (with  Rint  and  Rext)  stated  in  the  last  sub-section.  We  note 
that  our  theorems  work  with  any  function;  these  functions  above  are  the  simplest  functions  that  satisfy  the  constraints 
on  these  functions  stated  in  the  last  sub-section.  Next,  we  gather  data  from  industry  wide  studies  to  obtain  sample 
estimates  for  parameters. 

As  stated  in  Section  2,  values  of  direct  and  indirect  costs  of  violation  (average  of  Rint  and  Rext  is  $300  in 
healthcare  [46],  a  detailed  breakdown  is  present  in  the  ANSI  report  [2]),  maximum  personal  benefit  I  ($50  for  medical 
records  [2,  17]),  etc.  are  available  in  studies.  We  assume  Imax  =  $50.  Also,  in  absence  of  studies  quantitatively 
distinguishing  externally  and  internally  caught  violations  we  assume  Rint  =  Rext  =  $300.  Many  parameters  depends 
on  the  employee,  his  role  in  the  organization  and  type  of  violation.  Keeping  a  track  of  violations  and  behavior  within 
the  organization  offers  a  data  source  for  estimating  and  detecting  changes  in  these  parameters.  We  choose  values  for 
these  parameters  that  are  not  extremes,  e  =  $10,  I  =  $6,  tth  =  0.03,  8j\  =  0.4  and  Uk  =  40.  Further,  under 
certain  assumptions  we  calculate  Pf  (in  Appendix  B.2)  to  get  Pf  =  $10.  Finally,  the  average  cost  of  auditing  C  and 
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performance  factor  /./,  of  log  analysis  tool  should  be  known  to  V.  We  assume  values  C  =  $50,  and  an  intermediate 
performance  /;  =  1.5  of  the  tool. 

Finally,  analyzing  data  to  detect  and  estimate  may  require  the  use  of  statistical  methods,  data  mining  and  learning 
techniques.  For  example,  consider  detecting  change  in  I.  The  expected  behavior  of  A  is  determined  by  the  equilibrium 
of  the  game  (shown  in  next  section),  but,  there  is  also  the  possibility  of  deviation  with  probability  tth-  Thus,  this  is  a 
standard  detection  problem:  a  history  of  fixed  finite  length  of  employee  actions  can  be  used  to  obtain  an  estimate  eth 
and  if  the  difference  in  eth  and  Cfj,  is  statistically  significant  then  it  can  be  claimed  that  I  has  changed.  Various  other 
methods  [25]  other  than  the  simple  one  stated  above  can  be  used.  We  do  not  delve  into  details  of  these  methods  as  that 
is  beyond  the  scope  of  this  paper  and  estimating  risk  parameters  has  been  studied  extensively  in  many  contexts  [2, 30, 
40,44,49].  Later,  in  Section  5,  we  present  a  novel  learning  method  to  estimate  I. 

4  Equilibrium 

In  this  section,  we  define  a  suitable  equilibrium  concept  for  the  audit  game  (Section  4.1)  and  present  an  approximately 
cost-optimal  strategy  for  the  defender  such  that  the  best  response  to  that  strategy  by  the  adversary  results  in  the 
equilibrium  being  attained  (Section  4.2).  Recall  that  the  equilibrium  of  the  game  occurs  in  the  period  in  which  the 
game  parameters  are  fixed. 

4.1  Equilibrium  Concepts 

We  begin  by  introducing  standard  terminology  from  game  theory.  In  a  one-shot  extensive  form  game  players  move 
in  order.  We  assume  player  1  moves  first  followed  by  player  2.  An  extensive  form  repeated  game  is  one  in  which 
the  round  game  is  a  one-shot  extensive  game.  The  history  is  a  sequence  of  actions.  Let  H  be  the  set  of  all  possible 
histories.  Let  Si  be  the  action  space  of  player  i.  A  strategy  of  player  i  is  a  function  ay  :  II,  — >  Si,  where  //,  C  II  are 
the  histories  in  which  player  i  moves.  The  utility  in  each  round  is  given  by  ry  :  5i  x  S2  ->  R.  The  total  utility  is  a 
-discounted  sum  of  utilities  of  each  round,  normalized  by  1  —  S, . 

The  definition  of  strategies  extends  to  extensive  form  repeated  games  with  public  signals.  We  consider  a  special 
case  here  that  resembles  our  audit  game.  Player  1  moves  first  and  the  action  is  observed  by  player  2,  then  player  2 
moves,  but,  that  action  may  not  be  perfectly  observed,  instead  resulting  in  a  public  signal.  Let  the  space  of  public 
signals  be  Y.  In  any  round,  the  observed  public  signal  is  distributed  according  to  the  distribution  AV(.|s),  i.e., 
A Y (t/|s)  is  the  probability  of  seeing  signal  y  when  the  action  profile  s  is  played.  In  these  games,  a  history  is  defined 
as  an  alternating  sequence  of  player  l’a  action  and  public  signals,  ending  in  a  public  signal  for  histories  in  which 
player  1  moves  and  ending  in  player  l’s  move  for  histories  in  which  player  2  moves.  The  actual  utility  in  each  round 
is  given  by  the  function  r,  :  .S',  x  Y  — x  EL  The  total  expected  utility  g,  is  the  expected  normalized  ^-discounted  sum 
of  utilities  of  each  round,  where  the  expectation  is  taken  over  the  distribution  over  public  signals  and  histories.  For 
any  history  h,  the  game  to  be  played  in  the  future  after  h  is  called  the  continuation  game  of  h  with  total  utility  given 
by  gi{a ,  h). 

A  strategy  profile  (ay ,  cr2)  is  a  subgame  perfect  equilibrium  (SPE)  of  a  repeated  game  if  it  is  a  Nash  equilibrium 
for  all  continuation  games  given  by  any  history  h  [21],  One  way  of  determining  if  a  strategy  is  a  SPE  is  to  determine 
whether  the  strategy  satisfies  the  single  stage  deviation  property,  that  is,  any  unilateral  deviation  by  any  player  in 
any  single  round  is  not  profitable.  We  define  a  natural  extension  of  SPE,  which  we  call  asymmetric  subgame  perfect 
equilibrium  (or  (ei,  e2)-SPE),  which  encompasses  SPE  as  a  special  case  when  £i  =  e2  —  0. 

Definition  1.  ((ei,e2)-SPE)  Denote  concatenation  operator  for  histories  as  ;.  Strategy  profile  o  is  a  (ei,e2  )-SPE  if 
for  history  h  in  which  player  1  has  to  play,  given  h'  =  h;  oy  (ft)  and  h"  =  ft;  Si, 

E(rl  (0-1  ( h ) ,  y) )  [oy  (h) ,  oy>  (. h ')]  +  S1E(g1  (a,  ft';  y))  [ay  (ft) ,  cr2  O')] 

>  L;(ri(si,y))[si,o-2((i")]  +  51E(g1(a,  h";y))[slla2(ti')\  -  ei 
for  all  Si.  For  history  h  in  which  player  2  has  to  play,  given  a(h)  is  the  last  action  by  player  1  in  h,  for  all  s2 
E(r2{a2(h),y))[a(h),a2{h)]  +  S2E(g2(a,  ft;  y))[a{h),  cr2(h)} 

>  E(r2(s2,y))[a(h),s2)  +  S2E(g2(a,  h;y))[a(h),  s2]  -  e2 


We  are  particularly  interested  in  (ei,  0)-SPE,  where  player  1  is  the  defender  and  player  2  is  the  adversary.  By 
setting  62  =  0,  we  ensure  that  a  rational  adversary  will  never  deviate  from  the  expected  equilibrium  behavior.  Such 
equilibria  are  important  in  security  games,  since  62  >  0  could  allow  the  adversary  to  deviate  from  her  optimal  strategy 
only  for  the  purpose  of  causing  significant  loss  to  the  defender.  Also,  in  an  (ei,  0)-SPE,  the  defender  is  guaranteed  to 
be  at  most  within  61  of  her  optimal  cost  (which  is  the  cost  corresponding  to  (0, 0)-SPE),  which  is  particularly  relevant 
for  V  in  the  audit  game,  since  T>  is  rational  and  budget  constrained. 

The  following  useful  property  about  history-independent  strategies,  which  follows  directly  from  the  definition, 
helps  in  understanding  our  proposed  history-independent  audit  strategy. 

Property  1.  If  a  strategy  profile  a  is  history-independent,  i.e.,  o\  (h)  =  <7i()  and  (72(h)  =  a2(a(h))  then  the  condition 
to  test  for  SPE  reduces  to  E(ri(<Ji(),y))  >  E(ri(si,y)),for  player  1  and  to  E(r2(<72(h),y))  >  E(r2(s2,y)),for 
player  2,  since  gfict,  h;  y)  is  the  same  for  all  y  and  each  i.  Also,  if  E(ri(st,  y))  —  E(ri(ai(h),  y))  <  Cifor  all  i  and 
Si  then  a  is  an  (ei,  62 )-SPE  strategy  profile. 

4.2  Equilibrium  in  the  Audit  Game 

We  next  state  an  equilibrium  strategy  profile  for  the  game  Ga,u  such  that  V  performs  almost  cost-optimally.  Formally, 
we  present  a  (f-A,k-  0)-SPE  strategy  profile,  and  calculate  the  value  Cj\.k-  We  also  prove  that  the  strategy  profile  is 
within  k  cam  of  the  optimal  cost  of  auditing  for  the  organization.  We  accordingly  refer  to  this  strategy  profile  as 
a  near-optimal  strategy  profile  (for  V). 

It  is  important  that  V  makes  its  strategy  publicly  known,  and  provide  a  means  to  verify  that  it  is  playing  that 
strategy.  Indeed,  the  first  mover  in  an  extensive  form  game  has  an  advantage  in  deciding  the  outcome  of  the  game. 
Players  can  avail  of  this  advantage  by  committing  to  their  strategies,  such  as  in  inspection  games  [21],  As  noted 
earlier,  even  though  V  acts  after  A  does,  yet,  by  committing  to  its  strategy  with  a  verification  mechanism  V  simulates 
a  first  move  by  making  the  employee  believe  its  commitment  with  probability  1.  This  also  removes  any  variability  in 
belief  that  different  employees  may  have  about  the  organization’s  strategy.  Thus,  we  envision  the  organization  making 
a  commitment  to  stick  to  its  strategy  and  providing  a  proof  of  the  following  the  strategy.  We  argue  that  V  will  be 
willing  to  do  so  because  (1)  the  equilibrium  is  (ei,  0)-SPE  so  A  is  not  likely  to  deviate,  (2)  T>  is  patient  and  can  bear 
a  small  loss  due  to  occasional  mistakes  (with  probability  eth)  by  A  and  (3)  the  equilibrium  we  propose  is  close  to 
optimal  cost  for  V,  hence,  the  organization  would  be  willing  to  commit  to  it  and  the  employees  would  believe  the 
commitment.  Further,  V  making  its  strategy  publicly  known  follows  the  general  security  principle  of  not  making  the 
security  mechanisms  private  [51]. 

The  main  idea  behind  the  definition  of  the  near-optimal  strategy  profile  is  that  V  optimizes  its  utility  assuming 
the  best  response  of  A  for  a  given  a*.  That  is,  T>  assumes  that  A  does  not  commit  any  violations  when  (P,  a)  is  in 
the  deterred  region,  and  systematically  commits  a  violation  otherwise  (i.e.,  all  of  A’s  tasks  are  violations).  Further, 
T>  assumes  the  worst  case  when  the  employee  (with  probability  eth)  accidentally  makes  a  mistake  in  the  execution  of 
their  strategy;  in  such  a  case,  V  expects  all  of  A’s  tasks  to  be  violations,  regardless  of  the  values  of  (P,  a).  This  is 
because  the  distribution  I)f()  over  violations  when  A  makes  a  mistake  is  unknown. 

In  other  words,  the  expected  cost  function  that  V  optimizes  (for  each  total  number  of  tasks  a*)  is  a  linear  sum 
of  (1  —  eth)  times  the  cost  due  to  best  response  of  A  and  eth  times  the  cost  when  A  commits  all  violations.  The 
expected  cost  function  is  different  in  the  deterred  and  non-deterred  region  due  to  the  difference  in  best  response  of 
A  in  these  two  regions.  The  boundary  between  the  deterred  and  non-deterred  regions  is  conditioned  by  the  value  of 
the  adversary’s  personal  benefit  I.  We  assume  that  V  learns  the  value  of  the  personal  benefit  within  an  error  SI  of  its 
actual  value,  and  that  V  does  not  choose  actions  (P,  a )  in  the  region  of  uncertainty  determined  by  the  error  SI. 

Formally,  the  expected  reward  is  £(Rew^)[0]  when  the  adversary  commits  no  violation,  and  P(RewfD) [«/] 
when  all  a*  tasks  are  violations.  Both  of  these  expected  rewards  are  functions  of  P,  a\  we  do  not  make  that  explicit 
for  notational  ease.  Denote  the  deterred  region  determined  by  the  parameter  I  and  the  budget  as  R rD  and  the  non- 
deterred  region  as  R!ND.  Either  of  these  regions  may  be  empty.  Denote  the  region  (of  uncertainty)  between  the  curves 
determined  by  I  +  61  and  I  —  SI  as  P|7.  Then  the  reduced  deterred  region  is  given  by  RID\RISI  and  the  reduced 
non-deterred  region  by  P^r£)\P|J.  The  near-optimal  strategy  we  propose  is: 
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•  For  each  possible  number  of  tasks  a 4 that  can  be  performed  by  A,  T>  using  budget  b\  k,  assumes  the  expected  utility 

UD(P,  a)  =  (1  -  et/l)£(Rew5)[0]  +  ethE( Rewp)[a‘]  , 

in  and 

UND{P1  a)  =  (1  -  £th)-E(Rewp)[a*]  +  etft..E(Rewp)[a4]  , 
in  V  calculates  the  maximum  expected  utility  across  the  two  regions  as  follows: 

-  C  =  max(P,a)6itL\«L  UD(P’a) 

-  C  =  maX(P,a)6fl^D\it^  UND(P,a) 

-  U  =  rna x(C/^ax,  U™) 

V  commits  to  the  corresponding  maximizer  [P,  a)  for  each  a4. 

After  knowing  a4,  V  plays  the  corresponding  (P,  a). 

•  A  plays  her  best  response  (based  on  the  committed  action  of  V),  i.e.,  if  she  is  deterred  for  all  a4  she  commits  no 
violations  and  if  she  is  not  deterred  for  some  a4  then  all  her  tasks  are  violations,  and  she  chooses  the  a4  that  maximizes 
her  utility  from  violations.  But,  she  also  commits  mistakes  with  probability  et,h ,  and  then  the  action  is  determined  by 
distribution  D*0. 

Let  U£+SI  =  max(PiCl)e^uflrj  UD{P,  a),  and  U* °+SI  =  UND(P,  a).  We  have  the 

following  result: 

Theorem  1.  The  near-optimal  strategy  profile  ( defined  above)  is  an  (cap,  0 )-SPEfor  the  game  Gap-  where  cap  is 


max 


max(P^+5/  -  P^ax),  max((7I 


ND+SI 


eth  max 
ae[o,i] 


m—  1  \ 

SvE(r(dt  >  j))[Uk,Uk,  a] 

3= 0  / 


Remark  1.  The  analysis  of  the  theorem  above  is  stable  to  errors  in  the  value  of  the  parameter,  i.e.,  if  the  value  of  any 

parameter  is  wrong  the  analysis  stays  the  same  resulting  in  an  (e,  0 )-SPE,  but,  with  e  greater  than  cap- 

The  proof  is  in  Appendix  B.  The  proof  involves  showing  that  the  near-optimal  strategy  profile  has  the  single 

stage  deviation  property.  That  A  does  not  profit  from  deviating  is  immediate  because  A  chooses  the  best  response 
in  each  round  of  the  game.  The  bound  on  profit  from  deviation  for  V  has  two  terms.  The  first  term  arises  due  to  V 
ignoring  the  region  of  uncertainty  in  maximizing  its  utility.  The  maximum  difference  in  utility  for  the  deterred  region 
is  max„i|0i  —  U®ax)  and  for  the  undeterred  region  is  max„i  a<  (C+M  -  ©■  The  first  term  is  given 

by  the  maximum  of  these  quantities.  The  second  term  arises  due  to  the  use  of  the  worst  case  assumption  of  all  ( Uk ) 
violations  out  of  maximum  possible  Uk  tasks  when  A  makes  a  mistake  as  compared  to  the  case  when  lfit  is  known. 
Since  ^4’s  choice  only  affects  the  violation  loss  part  of  P’s  utility  and  mistakes  happen  only  with  probability  eth-  the 
second  term  is  the  maximum  possible  loss  due  to  violations  multiplied  by  eth- 

Numeric  applications.  The  above  theorem  can  be  used  to  calculate  concrete  values  for  e.A.k  when  all  parametric  func¬ 
tions  are  instantiated.  For  example,  with  the  values  in  Section  3.3,  we  obtain  cap  =  $200.  Assuming  A  performs 
the  maximum  Uk  =  40  number  of  tasks,  eAp  is  about  9.5%  of  the  cost  of  auditing  all  actions  of  A  with  maximum 
punishment  rate  ($2100),  with  no  violations,  and  about  3.3%  of  the  cost  incurred  due  to  all  violations  caught  exter¬ 
nally  ($6000),  with  no  internal  auditing  or  punishment.  Similarly,  if  we  assume  70%  audit  coverage  with  maximum 
punishment  and  four  violations,  the  expected  cost  for  organization  is  $2583,  which  means  eAp  corresponds  to  about 
7.7%  of  this  cost.  We  present  the  derivation  of  value  of  (jam  in  Claim  1  in  Appendix  B.  The  audit  coverage  here  is 
for  one  employee  only;  hence  it  can  be  as  high  as  100%.  Also,  since  Ga  is  a  parallel  composition  of  the  games  Gap 
for  all  k,  we  claim  that  the  near-optimal  strategy  profile  followed  for  all  games  Gap  is  a  (ffi,  epp,  0)-SPE  strategy 
profile  for  Ga-  (See  Lemma  3  in  Appendix  B.l.  )  Finally,  as  noted  earlier,  the  asymmetric  equilibrium  enables  us  to 
claim  that  the  expected  cost  for  V  in  the  near-optimal  strategy  profile  is  at  most  ffi,  eAJ.  more  than  the  optimal  cost. 
Since  costs  add  up  linearly,  2?’s  cost  in  the  whole  audit  process  is  at  most  ffi,  eA,k  more  than  the  optimal  cost. 
Also,  it  follows  from  Remark  1  that  errors  in  parameter  estimates  moves  the  cost  of  the  audit  process  further  away 
from  the  optimal  cost. 
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Figure  2:  Visual  representation  of  Algorithm  1,  shown  for  l  =  4  lines  and  n  =  3  points(x)  on  the  lines. 

5  Learning  Personal  Benefit 

In  this  section,  we  propose  a  learning  mechanism  to  learn  the  personal  benefit  parameter  I  of  A,  and  prove  that  the 
algorithm  is  an  effective  learner.  V  has  a  prior  belief  about  the  value  of  /,  namely  I  £  [I*  —  iq,  I*  +  / ul  for  parameters 
/*.  '<o-  We  assume  the  belief  is  correct.  Call  the  intersection  of  the  feasible  region  and  the  region  between  the  curves 
given  by  I*  —  i0,  I*  +  i0  the  search  region. 

Algorithm  1  is  the  adversarial  learning  algorithm  we  propose  to  learn  the  separator  given  by  I  =  Pt(v(at)  + 
p{  1  —  i/(a4))).  The  idea  of  the  algorithm  is  to  use  a  rotating  sweep  line  [18]  style  technique,  often  used  in  geometric 
algorithmic  problems.  Multiple  ( l )  lines  passing  through  the  origin  and  the  search  region  are  considered  (see  Figure  2). 
For  any  point  P' ,  a1  there  is  a  unique  value  I'  such  that  /'  =  P'(u(a')  +  p{  1  —  v[a!))),  then  P',  a!  lies  on  the  curve 
I'  =  P(y(a)  +  p(  1  —  v(a))).  Mark  n  points  on  each  line  within  the  search  space,  such  that  for  any  two  consecutive 
point  if  I',  I”  determines  the  curves  these  points  lie  on  then  \I'  —  I" \  =  ;^l| .  We  call  the  two  points  nearest  to  the 
true  separator  1-neighbors.  The  algorithm  works  by  trying  to  find  the  non-deterred  1-neighbor  point  on  each  of  the  l 
lines. 

We  use  the  standard  binary  search  algorithm,  named  BinSearch,  as  a  black-box.  In  our  setting  BinSearch’s  queries 
are  points  on  the  line  Li.  The  point  (P',  a')  lies  in  the  detered  region  if  and  only  if  I'  >  I.  The  answer  to  the  query 
V  >  I  can  therefore  be  inferred  from  the  number  of  detected  violations.  However,  there  are  two  additional  technical 
challenges  that  must  be  addressed  (1)  .4  may  be  willing  to  behave  (not  violate)  in  the  non-deterred  region  (or  misbehave 
in  the  deterred  region)  if  doing  so  would  bring  rewards  in  the  near  future,  and  (2)  even  if  A  plays  according  to  his 
immediate  preferences  during  the  learning  phase  the  answers  to  BinSearch’s  queries  are  noisy  due  to  the  trembling 
hand  assumption.  We  address  the  first  challenge  by  ensuring  that  during  any  round  of  the  learning  phase  the  next  d 
queries  are  not  affected  by  ^4’s  current  actions.  This  ensures  that  A  always  reveals  her  true  preferences  for  points  other 
than  1-neighbor  points  (Lemma  1).  We  address  the  second  challenge  (noisy  binary  search  problem)  by  querying  points 
multiple  times  and  taking  the  majority  vote  to  ensure  that  the  results  are  accurate  with  high  probability(Lemma  2). 

We  make  multiple  copies  of  BinSearch,  such  that  BinSearch,;  searches  on  line  Li.  BinSearch  returns  the  desired 
point  in  the  held  Result ,  which  can  be  null  in  case  there  is  no  such  point.  A  held  Done  checks  if  BinSearch  is  done 
with  its  processing  and  allDone  checks  whether  all  copies  of  BinSearch  are  done  with  processing.  If  some  copy 
of  BinSearch  hnishes  before  others  a  dummy  point  (a'0,  Pq)  may  be  queried  to  ensure  that  the  next  d  queries  remain 
fixed..  Each  query  point  is  used  2q  +  l  times  for  auditing  before  using  a  majority  vote  to  determine  the  answer.  Finally, 
d  dummy  rounds  of  audit  are  performed  with  points  (a\ .  If)... (of,  P'd)  chosen  before  the  algorithm  starts.  The  hrst 
step  in  analyzing  Algorithm  1  is  proving  that  the  optimal  behavior  of  A  is  playing  the  best  response  for  most  rounds, 
which  is  not  true  in  general  for  repeated  games. 

Lemma  1.  Assume  (2 q  +  l)(l  —  1)  >  d.  Then  in  Algorithm  1  the  adversary  chooses  the  best  response  for  all  points 
on  each  line,  except  the  1-neighbors,  if  d  >  \og1/&A  (K+ 1 ) 1  Uk ■*  . 

The  key  idea  in  the  proof  is  that  in  any  non-dummy  round  the  next  d  queries  are  fixed  and  known  to  A,  which  is 
ensured  by  the  d  dummy  rounds  at  the  end.  So  only  after  at  least  d  rounds  does  the  adversary  earn  any  benefit  from 
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Algorithm  1  Adversarial  learning 

Require:  n,l,d,q,  I* ,io,(a'0,  P^), a!d,P'd ),  (2q  +  l)(l  -  1)  >  d. 

9  4—  jfZYj,  Li  is  the  line  through  (0,  0)  at  angle  <f>  +  (i  —  1)6. 

Let  di  =  2 io/(n  +  1) 

for  *  =  1  to  l  do 

Mark  n  points  on  L,  in  the  search  space,  so  that  the  jth  (1  <i<  n)  point  lies  on  the  curve  given  by  I*  —  ig  + jd;. 

end  for 

Make  l  copies  of  BinSearch:  BinSearch!, BinSearch; 
while  allDone  ^  true  do 

for  i  =  1  to  l  do 

deter  4—  0,  (af,  Pf )  •<—  {q!0,Pq)  {default  query} 
if  BinSearch,. Done  ^  true  then 
(af,  Pf)  i—  BinSearch,  . nextQuery 

end  if 

for  k  =  1  to  2q  +  1  do 

Play  the  point  (af.  P?)  in  the  audit  game, 
if  number  of  detected  violations  =  0  then 
deter  4—  deter  +  1 

end  if 
end  for 

if  deter  >  q  +  1  then 

Return  deterred  to  BinSearch; 
else 

Return  non-deterred  to  BinSearch; 

end  if 

if  BinSearch;. Done  =  true  then 
(a;,P;)  4—  BinSearch;. Result 

end  if 
end  for 

allDone  <r-  BinSearch!. Done  A  ..  A  BinSearch;. Done 

end  while 

Play  the  point  (a},  P{), ...,  {a'd,  P'd)  in  the  audit  game. 

Use  the  curve  (of  the  form  of  separator)  that  passes  through  majority  of  the  points  (ai,  Pi), ...,  (a;,  P;)  to  learn  I. 


not  playing  the  best  response.  Since  that  the  employee  is  not  patient  (small  <S^),  thus,  the  utility  earned  in  future  is  not 
very  important.  By  making  d  large  enough  the  benefit  earned  in  future  by  not  playing  best  response  is  not  significant 
for  A.  The  proof  is  in  Appendix  B.  Next,  we  claim  that  Algorithm  1  is  an  effective  learner. 

Theorem  2.  Assume  that  all  conditions  and  result  of  Lemma  1  hold.  In  Algorithm  1,  the  defender  learns  the  value  of 
I  with  error  bounded  by  4 io/(n  +  1)  and  probability  greater  than 


-  t  (VV'o -«*>*♦“ 

i=q+ 1  V  1 


l  [log2  nl 


in  at-most  l(2q  +  1)  [log2  n]  +  d  number  of  rounds. 

The  proof  is  in  Appendix  B.The  proof  involves  using  the  majority  vote  technique  for  the  l  lines  with  the  number 
of  BinSearch’s  queries  no  larger  than  [log2  n]  to  get  the  high  probability  bound.  Then,  since  any  two  consecutive 
points  on  each  line  lie  on  curves  given  by  /',  I"  such  that  /'  -  /"  =  ,  we  obtain  the  error  bound  in  the  learned 

parameter. 
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Observe  that  choosing  a  large  n  will  ensure  a  small  error  in  the  learned  value  of  I.  Also,  using  Hoeffding  inequal¬ 
ity  [27]  the  probability  bound  above  is  >  (1  —  exp(— 2(2 q  +  1)(0.5  —  eth)2))l^log2  "J+1);  which  is  higher  for  higher 
values  of  q.  But,  a  large  n  and  q  also  increases  the  running  time  of  the  algorithm,  which  in  practice  should  not  be 
large.  Thus,  operationally  there  is  a  balance  required  in  the  choice  of  n  and  q.  We  show  some  concrete  values  for 
Theorem  2.  Using  =  1,  n  =  7,  in  addition  to  values  from  Section  3.3,  in  Lemma  1  we  get  d  >  10.19,  thus,  choose 
d  =  11.  Choose  l  =  5,  q  =  1  to  satisfy  (2 q  +  1)(Z  —  1)  >  d.  Then,  Algorithm  1  produces  a  value  for  /  with  the  error 
bound  0.5  and  probability  0.96  in  56  rounds.  Thus,  with  daily  audits  the  adversary’s  personal  benefit  is  learned  in  2 
months  with  near  certainty. 

Before  concluding  the  section,  we  discuss  the  algorithm  above  in  the  context  of  two  areas  in  algorithm  design: 
mechanism  design  and  noisy  binary  search.  First,  mechanism  design  is  a  technique  to  design  game  strategy  and 
incentives  so  that  players  with  a  private  type  (I  in  our  case)  reveal  the  true  value  of  their  type.  Its  use  varies  widely  from 
auctions  to  selling  goods.  In  online  mechanism  design  with  fixed  types,  commitment  to  a  strategy  by  the  mechanism 
designer  is  required  for  truthful  revelation  of  type  [43],  If  V  commits  to  playing  all  actions  in  the  learning  phase  then 
our  algorithm  can  be  considered  to  be  an  instance  of  approximate  [52]  online  mechanism  design  [43]  with  player  A 
and  mechanism  designer  V,  i.e.,  A  has  incentives  to  reveal  his  type  /  with  51  deviation.  The  approximation  arises 
because  V  cannot  learn  I  perfectly  from  the  finite  learning  phase.  However,  practically  V\  commitment  would  not 
be  credible,  as  V  would  want  to  deviate  to  the  more  cost-optimal  equilibrium  audit  strategy  than  keep  playing  the 
learning  phase  after  knowing  I. 

We  discuss  other  algorithms  for  noisy  binary  search  that  can  be  plugged  into  Algorithm  1  to  learn  the  adversary’s 
personal  benefit.  These  algorithms  take  longer  than  the  binary  search  in  Algorithm  1  to  learn  while  guaranteeing  that 
the  learned  value  is  correct  with  higher  probability.  We  favor  the  simple  binary  search  we  use  since  1)  for  the  choice 
of  parameters  above  it  has  low  running  time  3  log  n  and  2)  the  probability  bound  we  obtain  is  acceptable  for  learning 
with  humans.  The  binary  search  by  Nowak  [41]  is  based  on  multiplicative  weight  update  and  takes  time  more  than 

4  log(w/(lL  for  probability  of  success  1  —  5  with  error  rate  e.  The  algorithm  by  Karp  et  al.  [31]  resembles  standard 

i_v2£(i_e) 

binary  search,  except  that  it  allows  backtracking:  the  algorithm  checks  if  the  current  sub-interval  is  the  right  one  else 
it  backtracks  to  the  larger  parent  interval,  using  at  least  3  queries  in  the  interval.  However,  the  check  has  non-zero 
probability  of  error  even  on  the  right  interval,  thus,  the  expected  running  time  of  this  algorithm  is  more  than  3  log  n, 
but,  the  probability  of  success  is  higher  than  the  simple  binary  search.  Thus,  since  running  time  is  important,  the 
simple  binary  search  we  use  is  best  for  our  case.  However,  if  the  probability  of  success  is  critical  then  Algorithm  1  can 
use  any  binary  search  algorithm  as  long  as  next  d  steps  can  be  determined  by  the  adversary,  so  that  she  plays  the  best 
response  (Lemma  1). 

As  a  final  comment,  the  strength  of  the  algorithm  above  lies  in  the  fact  that  it  can  learn  any  separator  (with  small 
error)  by  choosing  l  properly.  Thus,  even  if  we  do  not  assume  any  knowledge  of  the  separator  we  can  still  use  the 
algorithm  above  to  learn  the  regions. 

6  Predictions  and  Interventions 

In  this  section,  we  use  our  model  to  predict  observed  practices  in  industry  and  the  effectiveness  of  policy  interventions 
in  encouraging  organizations  to  conduct  more  thorough  audits  by  analyzing  the  equilibrium  audit  strategy  P ,  a  under 
varying  parameters.  We  use  the  values  of  parameters  and  instantiation  of  functions  given  in  Section  3.3  (unless 
otherwise  noted).  We  assume  that  the  value  of  personal  benefit  /  is  learned  exactly  and  that  P  and  a  take  discrete 
values,  with  the  discrete  increments  being  0.5  and  0.05,  respectively.  We  also  assume  for  sake  of  exposition  that 
Uk  =  Uk ,  i.e.,  the  number  of  tasks  is  fixed. 

Average  cost  Rext  and  probability  p  of  external  detection  of  violation.  We  vary  Rexf  from  $5  to  $3900,  with  Rinf 
fixed  at  $300.  The  results  are  shown  in  Figure  3.  There  are  two  cases  shown  in  the  figure:  p  =  0.5  and  p  =  0.9.  The 
figure  shows  the  equilibria  P,  a  chosen  for  different  values  of  Rext. 

Prediction  1:  Increasing  Rext  and  p  is  an  effective  way  to  encourage  organizations  to  audit  more.  In  fact,  when 
P  *  Rext  is  low  X  may  not  audit  at  all.  Thus,  X  audits  to  protect  itself  from  greater  loss  incurred  when  violations  are 
caught  externally.  Surprisingly,  the  hospital  may  continue  to  increase  inspection  levels  (incurring  higher  cost)  beyond 
the  minimum  level  necessary  to  deter  a  rational  employee.  Hospital  X  does  so  because  the  employee  is  not  fully 
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Figure  3:  The  dashed  separator  for  2  values  of  p,  and  equilibrium  (see  legend)  P,  a  for  varying  values  (shown  above 
point)  of  Rext  from  5  to  3900. 

rational:  even  in  the  deterred  region  there  is  an  eth  probability  of  violations  occurring. 

Suggested  Intervention  1:  Subject  organizations  to  external  audits  and  fines  when  violations  are  detected.  For 
example,  by  awarding  contracts  for  conducting  150  external  audits  by  2012  [26],  HHS  is  moving  in  the  right  direction 
by  effectively  increasing  p.  This  intervention  is  having  an  impact:  the  201 1  Ponemon  study  on  patient  privacy  [47] 
states — “Concerns  about  the  threat  of  upcoming  HHS  HIPAA  audits  and  investigation  has  affected  changes  in  patient 
data  privacy  and  security  programs,  according  to  55  percent  of  respondents.” 

Prediction  2:  Interventions  that  increase  the  expected  loss  for  both  external  and  internal  detection  of  violations 
are  not  as  effective  in  increasing  auditing  as  those  that  increase  expected  loss  for  external  detection  of  violations 
only.  Table  5  shows  the  equilibrium  inspection  level  as  Rext  and  Rint  are  both  increased  at  the  same  rate.  While 
the  inspection  level  may  initially  increase,  it  quickly  reaches  a  peak.  As  an  example,  consider  the  principle  of  breach 
detection  notification  used  in  many  data  breach  laws  [50].  The  effect  of  breach  detection  notification  is  to  increase  both 
Rint  and  Rext  since  notification  happens  for  all  breaches.  While  there  isn’t  sufficient  data  for  our  model  to  predict 
whether  these  laws  are  less  effective  than  external  audits  (see  suggested  study  below),  prior  empirical  analysis  [50] 
indicate  that  the  benefit  in  breach  detection  from  these  laws  is  only  about  6%  (after  adjusting  for  increased  reporting 
of  breaches  due  to  the  law  itself). 

Suggested  study:  An  empirical  study  that  separately  reports  costs  incurred  when  violations  are  internally  detected 
from  those  that  are  externally  detected  would  be  useful  in  quantifying  and  comparing  the  effectiveness  of  interventions. 
Existing  studies  either  do  not  speak  of  these  distinct  categories  of  costs  [46,  50]  or  hint  at  the  importance  of  this 
distinction  without  reporting  numbers  [45, 57]. 

Punishment  loss  factor  e  and  personal  benefit  I.  Prediction  3:  Employees  with  higher  value  for  e  (e.g.,  doctors 
have  higher  e;  suspending  a  doctor  is  costlier  for  the  hospital  than  suspending  a  nurse)  will  have  lower  punishment 
levels.  If  punishments  were  free,  i.e.,  e  =  0,  (an  unrealistic  assumption)  X  will  always  keep  the  punishment  rate  at 
maximum  according  to  our  model.  At  higher  punishment  rates  (e  =  1000),  X  will  favor  increasing  inspections  rather 
than  increasing  the  punishment  level  P  (see  Table  1  in  Appendix  A).  While  we  do  not  know  of  an  industry-wide  study 
on  this  topic,  there  is  evidence  of  such  phenomena  occurring  in  hospitals.  For  example,  in  2011  Vermont’s  Office  of 
Professional  Regulation,  which  licenses  nurses,  investigated  53  allegations  of  drug  diversion  by  nurses  and  disciplined 
20.  In  the  same  year,  the  Vermont  Board  of  Medical  Practice,  which  regulates  doctors,  publicly  listed  1 1  board  actions 
against  licensed  physicians  for  a  variety  of  alleged  offenses.  However,  only  one  doctor  had  his  license  revoked  while 
the  rest  were  allowed  to  continue  practicing  [33]. 

Prediction  4:  Employees  who  cannot  be  deterred  are  not  punished.  When  the  personal  benefit  of  the  employee 
I  is  high,  our  model  predicts  that  X  chooses  the  punishment  rate  P  =  0  (because  this  employee  cannot  be  deterred 
at  all)  and  increases  inspection  as  Rext  increases  to  minimize  the  impact  of  violations  by  catching  them  inside  (see 
Table  2  in  Appendix  A).  Note  that  this  is  true  only  for  violations  that  are  not  very  costly  (as  is  the  case  for  our  choice 
of  costs).  If  the  expected  violation  cost  is  more  than  the  value  generated  by  the  employee,  then  it  is  better  to  fire  the 
non-deterred  employee  (see  Appendix  B.2). 
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When  I  is  low,  the  employee  is  deterred  even  for  low  values  of  P,  a.  While  this  seems  good  for  X,  an  important 
consideration  is  the  scenario  of  trust  trap  [5],  In  a  trust  trap  the  employee  earns  the  trust  of  X  by  behaving  properly 
for  many  audit  cycles,  and  then  commits  a  costly  violation  and  leaves.  Our  model  predicts  trust  trap. 

Prediction  5:  If  only  the  history  of  employee  actions  is  used  to  learn  I  and  there  have  been  no  past  violations  then  the 
value  of  learned  I  will  be  small.  A  small  I  will  mean  that  X  will  select  lower  inspection  levels.  This  would  enable  a 
patient  ( and  devious!)  employee  to  get  away  with  costly  violations. 

Suggested  Intervention  5:  X  can  make  informed  decisions  to  avoid  the  trust  trap,  e.g.,  set  a  minimum  value  for  I. 

Audit  cost  C  and  performance  factor  p  of  log  analysis  tool. 

Prediction  6:  If  audit  cost  C  decreases  or  the  performance  p  of  log  analysis  increases,  then  the  equilibrium  inspection 
level  increases.  The  data  supporting  this  prediction  is  presented  in  Table  3  and  4  in  Appendix  A.  Intuitively,  it 
is  expected  that  if  the  cost  of  auditing  goes  down  then  organizations  would  audit  more,  given  their  fixed  budget 
allocated  for  auditing.  Similarly,  a  more  efficient  mechanized  audit  tool  will  enable  the  organization  to  increase  its 
audit  efficiency  with  the  fixed  budget.  For  example,  MedAssets  claims  that  Stanford  Hospitals  and  Clinics  saved  about 
$4  million  by  using  automated  tools  for  auditing  [38]. 

7  Related  Work 

Auditing  and  Accountability:  Prior  work  studies  orthogonal  questions  of  algorithmic  detection  of  policy  viola¬ 
tions  [6,  8,  22,  54]  and  blame  assignment  [4,  7,  29,  34],  Feigenbaum  et  al.  [20]  report  work  in  progress  on  formal 
definitions  of  accountability  capturing  the  idea  that  violators  are  punished  with  or  without  identification  and  mediation 
with  non-zero  probability,  and  punishments  are  determined  based  on  an  understanding  of  “typical”  utility  functions. 
Operational  considerations  of  how  to  design  an  accountability  mechanism  that  effectively  manages  organizational  risk 
is  not  central  to  their  work.  In  other  work,  auditing  is  employed  to  revise  access  control  policies  when  unintended 
accesses  are  detected  [9,35,56].  Another  line  of  work  uses  logical  methods  for  enforcing  a  class  of  policies,  which 
cannot  be  enforced  using  preventive  access  control  mechanisms,  based  on  evidence  recorded  in  audit  logs  [13].  Cheng 
et  al.  [15, 16]  extend  access  control  to  by  allowing  agents  access  based  on  risk  estimations.  A  game-theoretic  approach 
of  coupling  access  control  with  audits  of  escalated  access  requests  in  the  framework  of  a  single-shot  game  is  studied 
by  Zhao  et  al.  [59].  These  works  are  fundamentally  different  from  our  approach.  We  are  interested  in  scenarios  where 
access  control  is  not  desirable  and  audits  are  used  to  detect  violations.  We  believe  that  a  repeated  game  can  better 
model  the  repeated  interactions  of  auditing. 

Risk  Management  and  Data  Breaches:  Our  work  is  an  instance  of  a  risk  management  technique  [40,  44]  in  the 
context  of  auditing  and  accountability.  As  far  as  we  know,  our  technique  is  the  first  instance  of  managing  risk  in 
auditing  using  a  repeated  game  formalism.  Risk  assessment  has  been  extensively  used  in  many  areas  [30,49];  the 
report  by  American  National  Standards  Institute  [2]  provides  a  risk  assessment  mechanism  for  healthcare.  Our  model 
also  models  data  breaches  that  happen  due  to  insider  attacks.  Reputation  has  been  used  to  study  insider  attacks  in  non- 
cooperative  repeated  games  [58];  we  differ  from  that  work  in  that  the  employer-employee  interaction  is  essentially 
cooperative.  Also,  the  primary  purpose  of  interaction  between  employer  and  employee  is  to  accomplish  some  task 
(e.g.,  provide  medical  care).  Privacy  is  typically  a  secondary  concern.  Our  model  captures  this  reality  by  considering 
the  effect  of  non-audit  interactions  in  parameters  like  Pj.  There  are  quite  a  few  empirical  studies  on  data  breaches  and 
insider  attacks  [46,50,57]  and  qualitative  models  of  insider  attacks  [5].  We  use  these  studies  to  estimate  the  parameters 
in  our  model  and  to  evaluate  the  predictions  of  our  model. 

Adversarial  Learning:  One  of  the  earliest  works  on  adversarial  machine  learning  is  by  Kearns  et  al.  [32]  on  extending 
the  PAC  learning  model  to  allow  a  fixed  probability  of  labeling  error.  Auer  et  al.  [3]  extend  the  online  model  to  allow 
for  bounded  malicious  error.  In  these  works,  the  adversary  can  always  fool  the  learner  unless  there  is  a  constraint  on 
the  number  of  labels  she  can  label  wrongly.  In  contrast,  in  our  setting  training  data  points  are  chosen  by  the  learner 
(organization)  and  the  labels  provided  by  the  adversary  (employee)  in  an  online  manner.  The  learner  outputs  the 
separator  after  all  the  training  data  has  been  collected.  We  use  a  novel  reasoning  involving  delayed  benefits  for  the 
impatient  adversary  in  a  repeated  game  setting  to  impose  bounds  on  the  malicious  error. 

Lowd  and  Meek  [36]  study  the  problem  of  learning  in  an  adversarial  setting  by  proposing  a  framework  to  study 
reverse  engineering  of  classifiers  to  perform  cost-optimal  (cost  of  adversary)  evasion  in  reasonable  amount  of  time. 
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They  further  propose  a  classifier  modification  that  predicts  the  adversary’s  evasion  based  on  the  assumption  of  myopic 
adversary  actions  and  adapts  the  classifier  to  counter  the  evasion.  In  particular,  they  do  not  look  for  an  equilibrium  of 
this  repeated  game.  Nelson  et  al.  [39]  provide  models  of  attacks  on  machine  learning  algorithms,  and  demonstrate  a 
few  attacks.  They  also  improve  upon  algorithms  to  find  cost-optimal  evasions  by  the  attacker.  In  contrast,  our  approach 
uses  the  adversary’s  discounting  of  future  benefit  to  allow  the  learner  to  incentivize  the  adversary  to  behave  in  a  desired 
manner.  Nelson  et  al.  [39]  also  propose  using  regret  minimizing  [11]  technique  in  case  of  repeated  game  learning 
setting,  which  converges  to  the  best  classifier  in  hindsight  w.r.t.  to  a  given  set  of  classifiers  with  no  assumption  about 
the  adversary.  In  contrast,  we  model  the  almost  rational  adversary’s  unknown  incentives  and  provide  high  probability 
guarantees  of  learning. 

8  Conclusion 

First,  as  public  policy  and  industry  move  towards  accountability-based  privacy  governance,  the  biggest  challenge  is 
how  to  operationalize  requirements  such  as  internal  enforcement  of  policies.  Principled  audit  and  punishment  schemes 
like  the  one  presented  in  this  paper  will  be  part  of  the  enforcement  regime  making  these  results  significant  in  practice. 
Second,  a  usual  complaint  against  this  kind  of  risk  management  approach  is  that  there  isn’t  data  to  estimate  the 
risk  parameters.  We  provide  evidence  that  a  number  of  parameters  in  the  game  model  can  be  estimated  from  prior 
empirical  studies,  suggest  specific  studies  that  can  help  estimate  other  parameters,  and  design  a  learning  algorithm 
that  the  defender  can  use  to  provably  learn  the  adversary’s  private  incentives.  Moving  forward,  we  plan  to  generalize 
our  results  to  account  for  colluding  adversaries,  explore  the  space  of  effective  policy  interventions,  and  evaluate  these 
mechanisms  through  user  studies. 
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B  Proofs 


Reminder  of  Lemma  1.  Assume  (2 q  +  !)(/—  1 )  >  d.  Then  in  Algorithm  1  the  near-rational  adversary  chooses  the 
best  response  for  all  points  on  each  line,  except  the  1-neighbors,  if 


d  >  logi/^ 


( H  +  1  ){PfUk  +  ImaxUk ) 
(1  -  <%) 2*o 


Proof  First,  observe  that  an  any  round  t  the  organizations  action  is  known  and  fixed  for  the  next  (2 q  +  1  )(l  —  1) 
rounds.  This  is  because  for  the  next  (2 q  +  1)(Z  —  1)  the  organization  queries  points  on  each  line  (2 q  +  1  queries  on 
each  line)  as  would  be  asked  by  BinSearch.  But,  these  queries  of  BinSearch  are  already  known,  as  these  are  the  next 
query  for  each  lines  which  is  determined  by  the  history  of  queries  for  that  line  which  is  known  exactly  at  time  t . 

Consider  two  possible  choices  of  punishment  and  level  of  inspection  P' ,  a'  and  P,  a  in  some  future  round.  The 
absolute  difference  in  expected  utility  in  that  round  for  number  of  violations  vn  and  vf  in  the  two  scenarios  is  I\  vf  — 
vn\  +  max(V,  v't)\P(v(a)  +  p(  1  —  t'(a)))  —  P'(v{a')  +  p(  1  —  v(a')))\.  Since  Pf  >  P(v(a)  +  p(  1  —  ^(a)))  >  0 
for  any  feasible  P,  I  <  Imax  and  max(V,  vn )  <  Uk  the  absolute  difference  in  expected  utility  per  round  is  bounded 
by  PjUk  +  ImaxUk ■  Thus,  PfUk  +  ImaxUk  is  the  maximum  expected  benefit  that  the  adversary  can  get,  by  making 
the  organization  play  differently. 

Next,  we  show  that  not  playing  the  best  response  for  any  non- 1 -neighbor  points  in  the  learning  results  in  a  total 
expected  utility  that  is  less  than  the  total  expected  utility  when  a  best  response  is  played,  thus,  by  rationality  a  rational 
adversary  will  always  play  the  best  response  for  the  non- 1 -neighbor  points  in  the  learning,  i.e.,  provide  the  right  answer. 
Observe  that  not  playing  the  best  response  means  not  committing  all  violations  when  I  —  P(y(a)  +p(  1  —  J'(a)))  >  0 
or  not  committing  0  violations  when  /  —  P{v[a)  +  p{  1  —  ^(a)))  <  0.  Also,  the  minimum  loss  for  not  playing  best 
response  is  when  the  deviation  from  number  of  violations  is  only  one.  Let  P,  a  be  any  non-neighbor  point.  Then  a 
difference  of  one  violations  for  this  point  produces  a  difference  in  utility  given  by 

Lmin  =  \I  ~  P{v{a)  +p(  1  -  v(a)))\ 

Lmin  is  the  least  difference  in  expected  utility  in  a  round  in  the  two  scenarios:  when  the  adversary  does  not  play 
the  best  response  and  when  he  does  play  the  best  response,  since  his  minimum  deviation  is  by  one  violation.  Now, 
the  non- 1 -neighbor  point  is  on  the  curve  defined  by  value  that  differs  from  /  by  a  minimum  of  2io/{n  +  1).  (by 
construction  of  algorithm).  Thus,  Lmin  >  2io/{n  +  1).  Also  note  that  the  majority  vote  for  each  point  forces  the 
adversary  to  provide  at  least  two  non-best  responses  to  make  any  difference  on  the  learning. 

Remember  that  the  learning  algorithm  is  known  to  the  adversary  and  the  organization  is  committed  to  stick  to  this 
algorithm.  Consider  two  scenarios:  one  in  which  the  adversary  decides  to  not  play  the  best  response  (call  it  bad  world) 
for  the  first  time  at  round  to  and  one  in  which  he  always  plays  the  best  response  (call  it  good  world),  for  any  point. 
Thus,  before  to  adversary’s  response  is  the  best  response  in  both  worlds.  In  both  worlds  the  organizations  actions  are 
known  and  fixed  till  at-least  round  to  +  d.  Then  the  best  case  for  the  adversary  in  the  bad  world  is  when  only  two 
non-best  response  (fooling  the  majority  vote)  gives  him  the  expected  benefit  of  PfUk  +  ImaxUk  after  the  to  +  d  round. 
Thus,  from  time  to  +  1  to  at-least  to  +  d  the  adversary’s  response  is  the  best  response  in  both  worlds  in  such  a  case, 
since  the  learner  is  committed  to  its  actions  in  these  rounds.  Since  playing  the  best  response  is  a  history-independent 
strategy  the  difference  in  expected  payoff  from  to  +  1  to  to  +  d  is  0  (using  Property  1).  We  have  already  calculated  the 
upper  bound  on  the  benefit  in  expected  utility  every  round  after  at-least  the  round  to  +  d  as  PfUk  +  ImaxUk ■  Thus, 
the  difference  in  total  expected  utility  in  these  two  worlds  (starting  from  round  to)  is  at  most 

(1  -Sa)(-l  min  ^A-^min  )  +  ( SA)d{PfUk  +  ImaxUk) . 

Given  d  >  \og1/SA  wehave  (5A)d(PfUk  +  ImaxUk)  <  il-52A)2io/{n  +  l).  Thus,  by  rationality 

the  adversary  chooses  the  action  with  more  utility  at  round  t0 ,  and  by  definition  that  is  the  best  response  at  round  t0.  □ 
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Lemma  2.  Assume  all  condition  and  result  of  Lemma  1  hold  and  that  the  adversary  is  near-rational.  Let  the  number 
of  queries  asked  by  BinSearch /or  some  given  line  among  the  l  lines  be  N.  The  probability  that  the  answers  for 
non-l-neighbor  points  on  the  line  are  correct  is  greater  than 


-  £  (2q  +  l\^Y{l-eth)2q+1-' 

i=q+ 1  '  ^  ' 


N 


Proof  First,  we  calculate  the  probability  of  a  wrong  answer  for  any  non-l-neighbor  point  used  in  the  audit  game. 
Using  Lemma  1  we  already  know  that  the  adversary  will  provide  his  best  response;  i.e.,  she  will  choose  0  violations 
in  deterred  region  and  all  violations  in  non-deterred  region,  but,  can  make  a  mistake  with  probability  eth  due  to  the 
trembling  hand  assumption.  Thus,  with  probability  1  —  eth  the  region  detected  will  be  the  right  region.  We  consider  the 
worst  case  when  the  adversary  makes  a  mistake.  The  worst  case  for  a  point  in  the  deterred  region  is  when  the  adversary 
commits  all  violations  and  for  a  point  in  the  non-deterred  region  is  when  the  adversary  commits  0  violations;  in  both 
cases  the  probability  of  a  wrong  region  being  detected  is  1.  Thus,  by  the  law  of  total  probability,  the  probability  of 
wrong  region  answer  in  one  round  is  upper  bounded  by  etj,.  Also,  using  majority  vote  with  2q  +  1  points  yields  a 
wrong  answer  when  more  than  q  answers  are  wrong.  Since  all  answers  are  independent,  i  answers  are  wrong  with 
probability  less  than  (2+ *)  (et+(  1  —  £th)2q+1  \  and  thus,  the  probability  of  the  majority  vote  being  right  is  given  by 


2q+l 

-  E 

i=q+ 1 


2q  +  1 

i 


++(  1  —  £fh) 


2q-\-l—i 


Then,  because  the  same  but  independent  procedure  is  repeated  for  the  N  points  in  the  worst  case  fif  the  1-neighbor 
points  are  queried  k  times  then  we  have  only  N  —  k  queries)  we  obtain  the  desired  result.  □ 


Reminder  of  Theorem  2.  Assume  that  all  conditions  and  result  of  Lemma  1  hold.  In  Algorithm  1,  the  defender 
learns  the  value  of  I  with  error  bounded  by  4 io/(n  +  1)  and  probability  greater  than 


2q+l 

E 

*=9+1 


2g+l 

i 


++(  1 


l  flog2  n] 


€th) 


2q+l-i 


in  at-most  l{2q  +  1)  [log2  n]  +  d  number  of  rounds. 

Proof.  First,  for  every  line  the  number  of  queries  of  BinSearch  is  not  more  than  [log2  ri  .  Using  Lemma  2  we  can 
claim  that  with  probability  greater  than 


-  E  (2?+ Vm-,*)’— ‘ 

*=9+1  V  1  J 


l  R°g2  "1 


all  the  non-l-neighbor  answers  for  all  l  lines  will  be  correct.  In  such  a  case,  the  algorithm  cannot  find  any  of  the 
deterred  2  or  higher  neighbor  points  as  the  desired  non-deterred  point.  Also  it  cannot  find  any  of  the  non-deterred 
3  or  higher  neighbor  points  as  the  desired  point.  Thus,  it  only  finds  the  non-deterred  1  or  2  neighbor  or  deterred 
1-neighbor  point  as  the  final  answer.  Then,  the  curve  fitted  in  the  final  step  of  the  algorithm  will  yield  the  value  of  I 
that  corresponds  to  the  curve  on  majority  of  the  final  answers  of  each  line  lie.  By  the  restriction  on  the  final  answers 
this  curve  will  either  pass  through  non-deterred  1  or  2  neighbor  or  deterred  1-neighbor  points  for  all  lines.  Also,  the 
curve  for  the  true  value  of  I  lies  between  non-deterred  1-neighbor  and  deterred  1-neighbor  points.  By  choice  of  these 
points  we  known  that  consecutive  points  lie  on  curves  that  differ  in  their  value  of  I  by  +j .  Thus,  the  learned  value  of 
/  will  differ  from  the  true  value  of  I  by  maximum  of  4l°.  . 

J  71+ 1 

The  total  number  of  rounds  is  upper  bounded  by  l  (2q  + 1 )  [ log2  n~|  +  d ,  which  is  from  2q  + 1  rounds  for  each  query 
and  a  maximum  of  |  log2  n]  queries  for  each  of  the  Z  line,  followed  by  d  dummy  rounds.  □ 
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Reminder  of  Theorem  1.  The  near-optimal  strategy  profile  (defined  above )  is  an  (xA.k-  0)-SPE  for  the  game  G a.), 
where  k  A 


max  (  max(C/^+5/  -  C/^ax),  max{U™+61  -  U™] 


eth  max 

aG  [0,1] 


Proof.  First  the  easy  case  for  the  employee:  the  employee  always  plays  a  best  response.  When  deterred  she  is  indiffer¬ 
ent  among  any  a4,  so  choice  of  a4  does  not  matter  in  that  case.  Thus,  there  is  0  benefit  for  the  employee  by  deviating 
with  the  history-independent  strategy  followed.  There  are  two  terms  in  the  eA,k  bound  for  the  organization.  The  first 
term  bounds  the  profit  from  deviation  due  to  the  fact  that  the  true  I  is  not  known.  The  second  term  further  bounds  the 
profit  from  deviation  due  to  the  fact  that  the  distribution  D °  is  unknown. 

Note  that  we  have  lifted  the  action  space  of  V  to  commitment  functions.  Thus,  we  need  to  compare  the  given 
commitment  with  other  commitment  functions.  First,  note  that  if  the  regions  were  known  properly,  and  Ifi  known 
then  it  is  possible  to  find  the  commitment  that  is  optimal  cost  for  each  fixed  value  of  a4.  Then,  it  is  enough  to  bound 
the  difference  in  utility  of  the  the  audit  commitment  function  to  this  optimal  commitment  function  across  all  values 
of  a4.  We  perform  the  analysis  for  any  fixed  a4,  then  taking  the  maximum  over  all  a4  to  bound  the  difference  in 
utility  when  V  could  move  to  the  optimal  commitment.  We  first  compare  the  audit  commitment  to  itself  when  the 
true  regions  are  known,  then  assuming  true  regions  are  known  we  compare  the  audit  commitment  to  the  optimal 
commitment.  Then  using  triangle  inequality  we  get  the  required  difference  for  a  fixed  a4.  Then  using  the  fact  that 
maxj.  /( x)  +  g(x)  <  max^.  f(x)  +  max^  g(x)  we  get  the  required  bound  for  all  a4. 

Suppose  the  near-optimal  strategy  profile  finds  a  point  in  the  region  RrD\R^f.  The  largest  true  deterred  region 
can  be  R^  U  Rgj.  Thus,  UjyffJ'1  —  U® ax  represents  the  maximum  profit  the  organization  could  have  obtained  by 
deviating  to  another  point  using  the  true  deterred  region  in  near-optimal  strategy,  with  some  fixed  value  of  i>4  and  a4. 
Then  the  maximum  taken  over  v 4  and  a4  gives  the  maximum  possible  profit  by  deviation  for  the  deterred  region  if 
the  true  deterred  region  were  known.  Similar  argument  shows  that  the  maximum  profit  by  deviation  for  non-deterred 
region  is  maxvt  at(U^^c+SI  —  Thus,  the  absolute  maximum  profit  from  deviation  for  any  region  is  given  by 

the  maximum  of  these  two  quantities. 

Next,  assume  that  the  true  deterred  region  Rd  is  known,  so  is  the  non-deterred  region  Rnd-  We  have  already  show 
above  that  the  maximum  profit  from  deviation  that  the  organization  would  get  using  near-optimal  strategy  with  Rd 
instead  of  Rrn\R.gf,  and  Rnd  instead  of  PjvzA-^ir-  Assume  that  the  true  regions  are  known  and  near-optimal  strategy 
outputs  (P,  a)  to  be  played  by  the  organization.  We  use  the  simplified  notation  with  the  game  under  consideration 
being  Ga,u-  Denote  by  /(P,  a)  the  function  P(Rewp)[0],  by  g(P,  a)  the  function  P(Rewp)[a4]  and  by  h(P ,  a) 
the  function  P(Rew^,)[Z?Q].  The  function  maximized  by  (P,  a)  is 

UD(P,a)  =  (1  -  eth )/(P,a)  +  ethg{P,ct)  , 


in  Rd  and  is 


UND(P,  a)  =  (1  -  eth)g(P ,  a)  +  ethg(P,  a) 

in  Rnd ■  Suppose  If)  was  known  and  the  point  (P',  a')  is  obtained  by  maximizing 

U'D(P,  a)  =  (1  -  eth)f{P,  ot)  +ethh(P,a) 


in  the  Rd  region  and 


U'ND(P,  <*)  =  (1  —  tth)g{P,  a)  +  ethh(P ,  a) 


in  the  Rnd  region.  We  emphasize  that  the  function  U'  is  the  true  expected  utility.  Consider  two  different  cases 
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•  (P,  a)  and  (P',  a!)  both  lie  in  the  same  region,  say  Rd-  Then,  the  maximum  benefit  to  be  gained  out  of  deviation 
is  U'D(P' ,  a')  —  U'D(P ,  a),  which  is 

(1  -  eth)(f(P',  a')  -  f(P,  a))  +  eth(h(P' ,  a')  -  h(P ,  a)) 

Also,  since  Pd(P,  a)  >  Ud(P' ,  ex')  we  have 

eth(g(P,  a)  -  g(P',  a1))  >  (1  -  e^)(/(P',  a1)  -  f{P ,  a)) 

Thus,  the  maximum  benefit  is  upper  bounded  by 

£th  ( g(P ,  a)  -  g(P',  a)  +  h(P' ,  a)  -  h(P,  a))  . 

The  upper  bound  is  same  for  the  non-deterred  case,  since  in  that  case  the  function  /(., .)  is  replaced  by  ij(., .)  in 
both  U  and  IT  and  the  exact  same  calculation  as  above  yields  the  same  bound. 

•  (P,  a)  and  (P1 .  a')  both  lie  in  different  regions,  say  Rd  and  Rnd  respectively.  Then,  the  maximum  benefit  to 
be  gained  out  of  deviation  is  U’ND{P’ ,  a!)  —  U'D(P,  a),  which  is 

(1  -  et/t)(ff(P',a/)  -  /(P,  a))  +  eth(h(P' ,  a')  -  h(P,a))  . 

Also,  since  Pd(P,  a)  >  Und(P\  a')  we  have 

£th{g(P,  a)  -  g(P',  a'))  >  (1  -  eth)(g(P',  a ')  -  /(P,  a))  . 

Thus,  the  maximum  benefit  is  upper  bounded  by 

eth  ( g(P ,  a)  -  g(P',  a')  +  h(P' ,  a')  -  h(P,  a))  . 

Now  suppose  that  (P,  a)  and  (P',  a ')  lie  in  Rnd  and  Rd  respectively.  Then,  the  maximum  benefit  to  be  gained 
out  of  deviation  is  U'D{P' ,  a')  —  U'ND(P,  a),  which  is 

(1  -  et/t)(/(P,,a')  -  g{P,a))  +  eth(h(P' ,a')  -  /i(P,a))  . 

Also,  since  Und(P ,  a)  >  Ud(P\  ex')  we  have 

£th{g(P,  a)  -  g{P a'))  >  (1  -  e^?l)(/(P,,  a')  -  g(P,  a ))  . 

Thus,  the  maximum  benefit  is  upper  bounded  by 

eth  ( g(P ,  a)  -  g(P',  a)  +  h(P' ,  a)  -  h(P,  a))  . 


The  above  cases  show  that  the  upper  bound  for  profit  from  deviation  in  one  round  is  always 

eth  ( g{P , «)  ^  g(P a1)  +  h(P',  a')  -  h(P,  a))  . 

Using  definition  of  expected  rewards  we  have 

g(P,a)  =  —C{atat)  —  e(Pt) 

m—  1 


h(P,a)  =  -0(0*0*)  -  e(P*)  -  £  <5^(P(r(0‘,  j))) 

j= o 


Note  that  for  any  P,  a 


m—  1 


h(P,a)  -  g(P,a)  =  -  ^  <  0  , 

j-o 
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thus,  the  upper  bound  above  is  further  bounded  by 


eth  (g(P,a)  -  h(P,a))  , 


which  is  given  by 

Observe  that  the  above  term  is  maximized  over  choice  of  Dq  when  D/  places  all  probability  mass  on  a4  (for  any 
a),  i.e.,  v*  =  a*.  Also,  the  expected  value  of  r  should  be  increasing  in  v*  (since  higher  vt  means  higher  detected 
violations),  and  v 4  =  a4  takes  a  maximum  value  of  Uk  for  game  G  a.  k-  Thus,  the  above  term  is  upper  bounded  by 

(m—1 

V'  6^-,E(r(Ot ,  j))[Uk,  Uk,  a] 

Now,  add  the  two  bounds  to  get  maximum  profit  from  deviation  in  one  round.  Further,  using  Property  1  and  noting 
that  the  strategy  is  history  independent  for  C a.};  we  now  obtain  the  desired  result.  □ 

Claim  1.  Assume  function  instantiations  from  Section  3.3.  Thus,  given  v(a)  =  pa  —  (p  —  l)a2,  we  must  have 
p  <2.  Further,  assuming  C  +  2 (Rint  —  Rext.P )  >  0  and  Rint  <  Rext.P,  the  caa  from  Theorem  1  is  given  by 
ethUk  ma x(Rint,  RextP)  +  A IA,k,  where  AIA,k  is 

2"mi"6>a -m-si)) 

Using  values  from  end  of  Section  3  we  can  get  ethUk  RextP )  =  0.03  *  40  *  150  =  180,  also,  the  minimum 

in  A  Ia,u  is  for  e/p  =  20.  Assuming,  SI  =  0.5  (remember  i  o  =  1,  assume  the  learning  reduces  region  of  uncertainty 
by  half),  we  have  A Ia,u  =  20.  Thus,  we  get  caa  =  $200. 

Proof.  Remember  that  Note  that  for  v(a )  <  1  to  hold,  it  must  be  that  /ia  —  (p  —  l)a2  <  1  for  a  £  [0, 1].  It  can  be 
readily  verified  that  this  happens  only  when  /i  <  2.  Remember  the  linear  functions  assumption  means  C(st)  =  Cf 
and  e(P4)  =  eP4. 


m—1 

max  V  (5^P(r(04,j))[Pfc,Pfc,a]  = 

aG[0,t]  ^ 

UkRextP  T  Uk  max  ( Rint  RextP)k'(cx') 
aS[  0,1] 

The  relevant  part  to  maximize  can  be  expanded  as 

( Rint  RextP)  (pO:  (/i  l)ct  ) 

For  p  <  2,  ( —  (/i  —  l)ct2)  increases  with  a  £  [0, 1]  (derivative  is  positive).  Thus,  if  Rint  >  RextP  then  a  =  1 
is  the  maximizer  else  a  =  0  is  the  maximizer.  Then,  it  is  not  difficult  to  conclude  that  the  maximum  value  is 

Uk  rnax(Pm4 ,  ReXfp). 

Now  observe  that,  since  /i  <  2,  p  >  //  (a  )  >  1  The  utility  function  maximized  by  the  organization  given  the  linear 
function  and  the  example  reputation  function  is  (using  simple  notation) 

- CthRextpa 4  -  eP4  -  cf  alG  -  ^(a4)a4et/1(Pi„t  -  Rextp )  , 

-Rextpof  -  eP4  -  of  cfC  -  v(at)at(Rint  -  Rextp) 

in  the  reduced  deterred  region  and  reduced  non-deterred  region  respectively.  Observe  that  for  the  non-deterred  case, 
using  assumption  C  +  2(Rint  —  Rextp)  >  0  implies  C  +  p(a)(Rint  —  Rextp)  >  0,  since  2>  p>  p(a)  >  1  and  all 
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quantities  C,  Rint  and  RextP  are  positive.  Thus,  the  maximizer  in  non-deterred  region  is  always  0,  0,  irrespective  of 
the  value  of  /,  hence  the  difference  in  costs  is  zero  for  the  cases  when  I  is  known  perfectly  and  when  there  is  an  error 
51. 

Assume  U^ax!  occurs  for  a  point  P' .  a!  and  U^ax  happens  for  a  point  P ,  a ,  and  learned  valued  of  personal  benefit 
is  I.  The  interesting  case  is  when  P',a'  /  P,  a  and  P ,  a  lies  on  the  curve  defined  by  I  +  SI.  Then  suppose  P' ,  o! 
lie  on  the  curve  defined  by  I  +  SI  —  (  for  251  >C>o.  Suppose  P' ,  a"  and  P" .  a'  are  points  on  the  curve  defined 
by  I  +  51  obtained  by  drawing  straight  lines  from  the  point  P' ,  a! .  Thus,  P'  <  P"  and  a!  <  a".  Note  that  since 
P'(v(a!)  +  p(  1  —  v{ar)))  =  I  +  51  —  Q  and  is(a),p  <  1,  we  can  claim  that  P'  >  I  —  51.  Then  we  have 

C  =  P"(v(a')  +p(  1  —  v{ct')))  —  P'(v{at)  +  p(  1  —  v{ot'))) 


pff  _P>  = 


c 


Also, 


v{a'){l  —  p)+p  p 
C  =  P>(a")  +p(  1  -  v{ct')))  -  P'(v(a')+p(  1  -  v(a'))) 


or 


Note  that 


-  "(o')  =  77  ^  p/  ^ 


c 


(1  -p)P'  -  (1  -P)(I-5I) 


v{a")  -  v{a')  =  p{a"  -  a')  -  (p  -  1  )(a"  -  a')  ((a"  +  a')) 
thus,  is  (a")  —  is  (a1)  >  p(a"  —  a1)  and  hence 


{a"  -  a')  < 


c 

Mi  -p)(i-si) 


Also,  UD(P,  a)  >  UD(P" ,  a')  and  UD{P,a )  >  UD{P',a")  and,  U£+SI  -  U°ax  =  UD+5I{P',a')  -  UD(P,a ) 

means  that 


ttD+SI 
^  max 


-  a 


D 

max 


<  mm(UD+iI (P' ,  a') 
UD+si(P',a') 


UD(P" ,  a'), 
UD(P\a")) 


Also,  UD+5I(P',  a')  —  UD(P",  a')  is  given  by 


—e(P'  -  P")  <  — 
P 


Also,  UD+SI(P',  a')  —  UD(P',  a")  is  given  by 


—a* (a1  -  a")C  -  -  v(a"))eth(Rint  -  Rextp ) 


which  can  be  simplified  to 


M(a"  -  a)(C  +  (p  -  (p  -  1  )(a"  +  a'))eth(Rint  -  RextP )) 

Using  result  1  <  p  <  2,  we  have  2  >  fi  —  (n  —  l)(a  +  a')  >  0.  Using  assumption  C  +  2 (Rint  —  RextP)  > 
0  we  can  say  that  (C  +  (p  —  (p  —  l)(a  +  a'))eth{Rint  —  Rextp))  >  0.  Also,  since  Rint  <  Rextp ,  we  have 
(C+(p-(p-l)(a  +  a'))eth(Rint- RextP))  <  C.  Thus,  using  the  inequalities  above  and  C  <  2 51,  U°+XSI -U£ ax 

is  less  than 

2Wmin(Md-p)u-«)) 

which  is  maximized  for  a1  =  Uk-  □ 
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B.l  Repeated  Product  Game  -  Definition  and  Results 

If  two  players  play  multiple  (repeated)  independent  games  in  parallel  then  it  is  possible  to  consider  a  composition  of 
these  games  which  is  itself  a  (repeated)  game.  By  independent  games  we  mean  that  these  games  are  played  without 
any  influence  from  the  other  games  in  parallel.  We  define  the  composition  below  for  a  repeated  game. 

Definition  2.  (Repeated  Product  Games)  Let  the  two  players  play  the  independent  one-shot  stage  games  G 1,  G2...,  Gn 
in  parallel  in  each  round  of  the  corresponding  n  repeated  games.  A  composition  of  the  n  stage  games  is  a  single-shot 
game  G  given  by  player  i’s  (i  =  1,2)  action  space  Si  =  ST*  x  5'2j...  x  Srii,  and  the  payoff  function  rfisi,  S2)  = 
X^j=i  n(sji,  sjf)  where  sji  £  Sji  and  Si  £  Si.  A  repeated  product  game  is  a  repeated  game  with  the  stage  game  in 
every  round  given  by  G. 

We  can  extend  the  above  definition  to  games  with  imperfect  monitoring  and  public  signaling,  similar  to  the  manner 
in  which  a  standard  repeated  game  is  extended.  Observe  that  any  strategy  a  of  a  repeated  product  game  can  be  decom¬ 
posed  into  strategies  crl, ...,  an  of  the  component  games,  because  of  the  independence  assumption  of  the  component 
games.  This  decomposition  leads  to  the  following  useful  results  summarized  in  the  lemma  below: 

Lemma  3.  Let  RG  be  a  repeated  product  game  with  the  stage  game  given  by  G,  such  that  G  is  a  parallel  composition 
of  Gl,  G  2...,  Gn  as  defined  in  Definition  2.  Consider  a  strategy  a  of  RG  with  the  decomposition  into  strategy  aifor 
each  component  game.  Then 

•  The  strategy  a  is  a  SPE  iff  the  strategy  ai  is  a  SPEfor  the  repeated  game  with  stage  game  Gifor  all  i. 

•  The  strategy  a  is  an  (^"=1  eih  S"=i  e2  i)-SPE  if  the  strategy  ai  is  a  (c\i,  e-2i)-SPEfor  the  repeated  game  with 
stage  game  Gifor  all  i. 

Proof.  For  the  first  case  assume  ai  is  a  SPE  for  the  repeated  game  with  stage  game  Gi  for  all  i.  Since  a  is  given  by 
crl, ...,  an  any  unilateral  deviation  from  a  results  in  a  unilateral  deviation  from  one  or  more  of  crl, ...,  an,  suppose  it 
is  aj.  By  assumption  that  is  not  profitable  for  repeated  game  given  by  the  stage  game  Gj.  Since  the  payoff  in  G  is  the 
sum  of  payoffs  in  G 1, ...,  Gn  and  payoffs  of  games  other  than  jth  game  remains  same,  the  deviation  is  not  profitable 
for  G  also. 

The  other  direction  is  very  similar.  Assume  a  is  a  SPE.  Since  a  is  given  by  crl,  ...,an  any  unilateral  deviation 
from  aj  results  in  a  unilateral  deviation  from  a.  By  assumption  that  is  not  profitable  for  repeated  game  given  by  the 
stage  game  G.  Since  the  payoff  in  G  is  the  sum  of  payoffs  in  Gl, ...,  Gn  and  payoffs  of  games  other  than  jth  game 
remains  same,  the  deviation  is  not  profitable  for  Gj  also. 

Next,  for  the  second  part  since  the  payoff  of  G  is  the  sum  of  payoff’s  of  Gi’s  and  any  any  unilateral  deviation  from 
a  results  in  a  unilateral  deviation  from  one  or  more  of  crl, ...,  an,  then  it  is  not  difficult  to  check  that  the  profit  from 
deviation  will  not  more  than  the  sum  of  profit  from  deviation  in  each  of  the  repeated  games  defined  by  Gl, ...,  Gn. 
Thus,  the  maximum  profit  from  deviation  for  player  j  is  cji.  □ 

B.2  Determining  Pj  and  Punishment 

In  addition  to  the  action  dependent  utilities  above,  the  players  also  get  an  fixed  utility  in  each  round  of  Qa,  which 
is  the  salary  Sal  a  for  A  and  the  value  created  by  the  employee  g  x  Sal  a  for  V.  Note  that  this  is  the  salary  and 
value  created  for  the  duration  of  one  audit  cycle.  Also,  note  that  this  fixed  utility  is  not  part  of  any  game  G  a, k-  Let 
Rk  be  the  maximum  loss  of  reputation  possible  for  violation  of  type  k.  We  assume  that  the  maximum  punishment 
P fire,  A  ( kj  rate  for  each  type  k  is  proportional  to  /(/,,.  Since  the  employee  can  make  mistakes,  in  the  worst  case  he  can 
lose  an  expected  amount  of  et/,  PfA{k)Uk-  This  loss  must  be  less  than  a  fixed  fraction  net  of  Sal  a,  or  else  the 
employee  is  better  off  quitting  and  getting  betters  expected  payoff  in  every  round  in  some  other  job.  Thus,  we  must 
have  et.h  J2k  Pf,A(k)Uk  =  net  ■  Sal  a,  which  yields  a  value  P/,_4(fc)  =  Rknet  ■  SalA/(eth  J2k  RkUk).  Observe  that 
an  employee  with  higher  salary  can  be  punished  more.  For  example,  suppose  A  does  three  types  k,  k'  of  tasks  such 
that  in  every  week  Uk  =  40,  Uk'  =  10  and  R^  =  12 Rk  and  net  =  0.1  with  weekly  salary  $500.  Then,  Pf{k)  =  10.4. 

Next,  consider  the  case  the  the  employee  is  non-deterred  for  violations  of  type  k.  Then  suppose  the  expected  loss 
to  the  organization  in  every  round  for  such  a  case  is  maximum  of  UkLk,  where  is  maximum  per  violation  cost 
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(dependent  on  a)  that  can  be  calculated  from  our  model.  In  such  a  case  if  it  happens  that  UkL  >  (<j  1  )Suly\  then 

the  organization  obtains  no  benefit  from  employing  A.  Thus,  in  such  a  case  the  organization  must  fire  the  employee. 

B.3  Budget  Optimization  Problem 

If  the  overall  budget  in  an  audit  cycle  is  given  by  B  then  we  must  have  fc  <  B.  Further  let  f{b\{k))  denote 

the  expected  utility  in  game  Qam  from  the  equilibrium  computed  in  Section  4.  Note  that  the  maximum  of  the  cost 
functions  in  deterred  and  non-deterred  regions  are  continuous  in  b\{k),  since  the  regions  themselves  change  contin¬ 
uously  with  change  in  b\{k).  Since  the  equilibrium  utility  involves  taking  maximum  of  two  continuous  functions, 
using  fact  that  max  of  two  functions  is  continuous,  we  get  that  f{b\(k))  is  continuous.  Thus,  the  optimal  allocation 
of  budget  is  to  solve  the  following  non-linear  optimization  problem 

maximize  f^AW)  subject  to  ^  b\{k)  <  B 
A,k  A,k 
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